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POLYNUCLEOTIDES AND POLYPEPTIDES ENCODED THEREBY 
DISTANTLY HOMOLOGOUS TO HEPARANASE 

FIELD AND BACKGROUND OF THE INVENTION 
5 The present invention relates to novel polynucleotides encoding 

polypeptides distantly homologous to heparanase, nucleic acid constructs 
including the polynucleotides, genetically modified cells expressing same, 
recombinant proteins encoded thereby and which may have heparanase or 
other glycosyl hydrolase activity, antibodies recognizing the recombinant 

10 proteins, oligonucleotides and oligonucleotide analogs derived from the 
polynucleotides and ribozymes including same. 

Citation or identification of any reference in this application shall 
not be construed as an admission that such reference is available as prior 
art to the present invention. 

15 Glycosaminoglycans (GAGs) 

GAGs are polymers of repeated disaccharide units consisting of 
uronic acid and a hexosamine. Biosynthesis of GAGs except hyaluronic 
acid is initiated from a core protein. Proteoglycans may contain , several 
GAG side chains from similar or different families. GAGs are synthesized 

20 as homopolymers which may subsequently be modified by N-deacetylation 
and N-sulfation, followed by C5-epimerization of glucuronic acid to 
iduronic acid and O-sulfation. The chemical composition of GAGs from 
various tissues varies highly. 

The natural metabolism of GAGs in animals is carried out by 

25 hydrolysis. Generally, the GAGs are degraded in a two step procedure. 
First the proteoglycans are internalized in endosomes, where initial 
depolymerization of the GAG chain takes place. This step is mainly 
hydrolytic and yields oligosaccharides. Further degradation is carried out 
after fusion with lysosome, where desulfation and exolytic 

30 depolymerization to monosaccharides take place (42). 

The only mammalian GAG degrading endolytic enzymes 
characterized so far are the hyaluronidases. The hyaluronidases are a 
family of 1-4 endoglucosaminidases that depolymerize hyaluronic acid and 
chondroitin sulfate. The cDNAs encoding sperm associated PH-20 

35 (Hyal3), and the lysosomal hyaluronidases Hyal 1 and Hyal2 were cloned 
and published (27). These enzymes share an overall homology of 40 % 
and have different tissue specificities, cellular localizations and PH 
optima. 
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Exolytic hydrolases are better characterized, among which are P- 
glucoronidase, a-L-iduronidase, and P-N-acetylglucosaminidase. In 
addition to hydrolysis of the glycosidic bond of the polysaccharide chain, 
GAG degradation involves desulfation, which is catalyzed by several 

5 lysosomal sulfatases such as N-acetylgalactosamine-4-sulfatase, iduronate- 
2-sulfatase and heparin sulfamidase. Deficiency in any of lysosomal GAG 
degrading enzymes results in a lysosomal storage disease, 
mucopolysaccharidosis. 

Glycosyl hydrolases: 

10 Glycosyl hydrolases are a widespread group of enzymes that 

hydrolyze the o-glycosidic bond between two or more carbohydrates or 
between a carbohydrate and a noncarbohydrate moiety. The enzymatic 
hydrolysis of glycosidic bond occurs by using major one or two 
mechanisms leading to overall retention or inversion of the anomeric 

15 configuration. In both mechanisms catalysis involves two residues: a 
proton donor and a nucleophile. Glycosyl hydrolyses have been classified 
into 58 families based on amino acid similarities. The glycosyl hydrolyses 
from families 1, 2, 5, 10, 17, 30, 35, 39 and 42 act on a large variety of 
substrates, however, they all hydrolyze the glycosidic bond in a general 

20 acid catalysis mechanism, with retention of the anomeric configuration. 
The mechanism involves two glutamic acid residues, which are the proton 
donors and the nucleophile, with an aspargine always preceding the proton 
donor. Analyses of a set of known 3D structures from this group revealed 
that their catalytic domains, despite the low level of sequence identity, 

25 adopt a similar (ot/p) 8 fold with the proton donor and the nucleophile 
located at the C-terminal ends of strands P4 and p7, respectively. 
Mutations in the functional conserved amino acids of lysosomal glycosyl 
hydrolases were identified in lysosomal storage diseases. 

Lysosomal glycosyl hydrolases including p-glucuronidase, p- 

30 manosidase, p-glucocerebrosidase, P-galactosidase and a-L-iduronidase, 
are all exo-glycosyl hydrolases, belong to the GH-A clan and share a 
similar catalytic site. However, many endo-glucanases from various 
organisms, such as bacterial and fungal xylenases and cellulases share this 
catalytic domain (1). 

35 Heparan sulfate proteoglycans (HSPGs) 

HSPGs are ubiquitous macromolecules associated with the cell 
surface and extracellular matrix (ECM) of a wide range of cells of 
vertebrate and invertebrate tissues (3-7). The basic HSPG structure 
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consists of a protein core to which several linear heparan sulfate chains are 
covalently attached. The polysaccharide chains are typically composed of 
repeating hexuronic and D-glucosamine disaccharide units that are 
substituted to a varying extent with N- and O-linked sulfate moieties and 

5 N-linked acetyl groups (3-7). Studies on the involvement of ECM 
molecules in cell attachment, growth and differentiation revealed a central 
role of HSPGs in embryonic morphogenesis, angiogenesis, metastasis, 
neurite outgrowth and tissue repair (3-7). The heparan sulfate (HS) 
chains, which are unique in their ability to bind a multitude of proteins, 

10 ensure that a wide variety of effector molecules cling to the cell surface (6- 
8). HSPGs are also prominent components of blood vessels (5). In large 
vessels they are concentrated mostly in the intima and inner media, 
whereas in capillaries they are found mainly in the subendothelial 
basement membrane where they support proliferating and migrating 

15 endothelial cells and stabilize the structure of the capillary wall. The 
ability of HSPGs to interact with ECM macromolecules such as collagen, 
laminin and fibronectin, and with different attachment sites on plasma 
membranes suggests a key role for this proteoglycan in the self-assembly 
and insolubility of ECM components, as well as in cell adhesion and 

20 locomotion. Cleavage of HS may therefore result in disassembly of the 
subendothelial ECM and hence may play a decisive role in extravasation 
of normal and malignant blood-borne cells (9-11). HS catabolism is 
observed in inflammation, wound repair, diabetes, and cancer metastasis, 
suggesting that enzymes which degrade HS play important roles in 

25 pathologic processes. 

Heparanase 

Heparanase is a glycosylated enzyme that is involved in the 
catabolism of certain glycosaminoglycans. It is an endoglucouronidase 
that cleaves heparan sulfate at specific intrachain sites (12-15). Interaction 

30 of T and B lymphocytes, platelets, granulocytes, macrophages and mast 
cells with the subendothelial extracellular matrix (ECM) is associated with 
degradation of heparan sulfate by heparanase activity (16). Connective 
tissue activating peptide III (CTAP), a c-chemokine 5 was found to have 
heparanase-like activity. Placenta heparanase acts as an adhesion 

35 molecule or as a degradative enzyme depending on the pH of the 
microenvironvent (17). 

Heparanase is released from intracellular compartments (e.g., 
lysosomes, specific granules) in response to various activation signals 
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(e.g., thrombin, calcium ionophores, immune complexes, antigens and 
mitogens), suggesting its regulated involvement in inflammation and 
cellular immunity responses (16). 

It was also demonstrated that heparanase can be readily released 
5 from human neutrophils by 60 minutes incubation at 4 C in the absence of 
added stimuli (18). 

Gelatinase, another ECM degrading enzyme which is found in 
tertiary granules of human neutrophils with heparanase, is secreted from 
the neutrophils in response to phorbol 12-myristate 13-acetate (PMA) 
10 treatment (19-20). 

In contrast, various tumor cells appear to express and secrete 
heparanase in a constitutive manner in correlation with their metastatic 
potential (21). 

Degradation of heparan sulfate by heparanase results in the release 

15 of heparin-binding growth factors, enzymes and plasma proteins that are 
sequestered by heparan sulfate in basement membranes, extracellular 
matrices and cell surfaces (22-23). 

Heparanase activity has been described in a number of cell types 
including cultured skin fibroblasts, human neutrophils, activated rat T- 

20 lymphocytes, normal and neoplastic murine B-lymphocytes, human 
monocytes and human umbilical vein endothelial cells, SK hepatoma cells, 
human placenta and human platelets. 

A procedure for purification of natural heparanase was reported for 
SK hepatoma cells and human placenta (U.S. Pat. No. 5,362,641) and for 

25 human platelets derived enzymes (62). 

Cloning and expression of the heparanase gene 
A purified fraction of heparanase isolated from human hepatoma 
cells was subjected to tryptic digestion. Peptides were separated by high 
pressure liquid chromatography (HPLC) and micro sequenced. The 

30 sequence of one of the peptides was used to screen data bases for 
homology to the corresponding back translated DNA sequence. This 
procedure led to the identification of a clone containing an insert of 1020 
base pairs (bp) which included an open reading frame of 963 bp followed 
by 27 bp of 3 1 untranslated region and a poly A tail. The new gene was 

35 designated hpa. Cloning of the missing 5' end of hpa was performed by 
Marathon RACE from placenta cDNA composite. The joined hpa cDNA 
(also referred to as phpa) fragment contained an open reading frame, 
which encodes a polypeptide of 543 amino acids with a calculated 
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molecular weight of 61,192 daltons (2). The cloning procedures are 
described in length in U.S. Pat. application Nos. 08/922,170,09/109,386, 
and 09/258,892, the latter is a continuation-in-part of PCT/US98/17954, 
filed August 31, 1998, all of which are incorporated herein by reference. 

5 The genomic locus which encodes heparanase spans about 40 kb. It 

is composed of 12 exons separated by 11 introns and is localized on 
human chromosome 4. 

The ability of the hpa gene product to catalyze degradation of 
heparan sulfate (HS) in vitro was examined by expressing the entire open 

10 reading frame of hpa in High five and Sf21 insect cells, and the 
mammalian human 293 embryonic kidney cell line expression systems. 
Extracts of infected or transfected cells were assayed for heparanase 
catalytic activity. For this purpose, cell lysates were incubated with sulfate 
labeled, ECM-derived HSPG (peak I), followed by gel filtration analysis 

15 (Sepharose 6B) of the reaction mixture. While the substrate alone 
consisted of high molecular weight material, incubation of the HSPG 
substrate with lysates of cells infected or transfected with hpa containing 
vectors resulted in a complete conversion of the high molecular weight 
substrate into low molecular weight labeled heparan sulfate degradation 

20 fragments (see, for example, U.S. Pat. application No. 09/071,618, which 
is incorporated herein by reference. 

In other experiments, it was demonstrated that the heparanase 
enzyme expressed by cells infected with a p¥hpa virus is capable of 
degrading HS complexed to other macromolecular constituents (e.g., 

25 fibronectin, iaminin, collagen) present in a naturally produced intact ECM 
(see U.S. Pat. application No. 09/109,386, which is incorporated herein by 
reference), in a manner similar to that reported for highly metastatic tumor 
cells or activated cells of the immune system (7, 8). 

Preferential expression of the hpa gene in human breast and 

30 hepatocellular carcinomas 

Semi-quantitative RT-PCR was applied to evaluate the expression 
of the hpa gene by human breast carcinoma cell lines exhibiting different 
degrees of metastasis. A marked increase in hpa gene expression is 
observed which correlates to metastatic capacity of non-metastatic MCF-7 

35 breast carcinoma, moderately metastatic MDA 23 1 and highly metastatic 
MDA 435 breast carcinoma cell lines. Significantly, the differential 
pattern of the hpa gene expression correlated with the pattern of 
heparanase activity. 
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Expression of the hpa gene in human breast carcinoma was 
demonstrated by in situ hybridization to archival paraffin embedded 
human breast tissue. Hybridization of the heparanase antisense riboprobe 
to invasive duct carcinoma tissue sections resulted in a massive positive 
staining localized specifically to the carcinoma cells. The hpa gene was 
also expressed in areas adjacent to the carcinoma showing fibrocystic 
changes. Normal breast tissue derived from reduction mammoplasty failed 
to express the hpa transcript. High expression of the hpa gene was also 
observed in tissue sections derived from human hepatocellular carcinoma 
specimens but not in normal adult liver tissue. Furthermore; tissue 
specimens derived from adenocarcinoma of the ovary, squamous cell 
carcinoma of the cervix and colon adenocarcinoma exhibited strong 
staining with the hpa RNA probe, as compared to a very low staining of 
the hpa mRNA in the respective non-malignant control tissues (2). 

A preferential expression of heparanase in human tumors versus the 
corresponding normal tissues was also noted by immunohistochemical 
staining of paraffin embedded sections with monoclonal anti-heparanase 
antibodies. Positive cytoplasmic staining was found in neoplastic cells of 
the colon carcinoma and in dysplastic epithelial cells of a tubulovillous 
adenoma found in the same specimen while there was little or no staining 
of the normal looking colon epithelium located away from the carcinoma. 
Of particular significance was an intense immunostaining of colon 
adenocarcinoma cells that had metastasized into the liver, as compared to 
the surrounding normal liver tissue. 

Latent and active forms of the heparanase protein 
The apparent molecular size of the recombinant enzyme produced 
in the baculovirus expression system was about 65 kDa. This heparanase 
polypeptide contains 6 potential N-glycosylation sites. Following 
deglycosylation by treatment with peptide N-glycosidase, the protein 
appeared as a 57 kDa band. This molecular weight corresponds to the 
deduced molecular mass (61,192 daltons) of the 543 amino acid 
polypeptide encoded by the full length hpa cDNA after cleavage of the 
predicted 3 kDa signal peptide. No further reduction in the apparent size 
of the N-deglycosylated protein was observed following concurrent O- 
glycosidase and neuraminidase treatment. Deglycosylation had no 
detectable effect on enzymatic activity. 

Unlike the baculovirus enzyme, expression of the full length 
heparanase polypeptide in mammalian cells (e.g., 293 kidney cells, CHO) 
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yielded a major protein of about 50 kDa and a minor about 65 kDa protein 
in cell lysates. Preferential release of the about 65 kDa form into the 
culture medium was noted in some of the transfected CHO clones. 
Comparison of the enzymatic activity of the two forms, using a semi- 
5 quantitative gel filtration assay, revealed that the 50 kDa enzyme is about 
100-fold more active than the 65 kDa form. A similar difference was 
observed when the specific activity of the recombinant 65 kDa baculovirus 
enzyme was compared to that of the 50 kDa heparanase preparations 
purified from human platelets, SK-hep-1 cells, or placenta. These results 

10 suggest that the 50 kDa protein is a mature processed form of a latent 
heparanase precursor. Amino terminal sequencing of the platelet 
heparanase indicated that cleavage occurs between amino acids glu^7_ 
lys*58 As indicated by the hydropathic plot of heparanase, this site is 
located within a hydrophillic peak which is likely to be exposed and hence 

15 accessible to proteases. 

Involvement of Heparanase in Tumor Cell Invasion and 
Metastasis 

Circulating tumor cells arrested in the capillary beds often attach at 
or near the intercellular junctions between adjacent endothelial cells. Such 

20 attachment of the metastatic cells is followed by rupture of the junctions, 
retraction of the endothelial cell borders and migration through the breach 
in the endothelium toward the exposed underlying base membrane (BM) 
(24). Once located between endothelial cells and the BM, the invading 
cells must degrade the subendothelial glycoproteins and proteoglycans of 

25 the BM in order to migrate out of the vascular compartment. Several 
cellular enzymes (e.g., collagenase IV, plasminogen activator, cathepsin B, 
elastase, etc.) are thought to be involved in degradation of BM (25). 
Among these enzymes is heparanase that cleaves HS at specific intrachain 
sites (16, 11). Expression of a HS degrading heparanase was found to 

30 correlate with the metastatic potential of mouse lymphoma (26), 
fibrosarcoma and melanoma (21) cells. Moreover, elevated levels of 
heparanase were detected in sera from metastatic tumor bearing animals 
and melanoma patients (21) and in tumor biopsies of cancer patients (12). 

The inhibitory effect of various non-anticoagulant species of 

35 heparin on heparanase was examined in view of their potential use in 
preventing extravasation of blood-borne cells. Treatment of experimental 
animals with heparanase inhibitors markedly reduced (> 90 %) the 
incidence of lung metastases induced by B16 melanoma, Lewis lung 
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carcinoma and mammary adenocarcinoma cells (12, 13, 28). Heparin 
fractions with high and low affinity to anti-thrombin III exhibited a 
comparable high anti-metastatic activity, indicating that the heparanase 
inhibiting activity of heparin, rather than its anticoagulant activity, plays a 

5 role in the anti-metastatic properties of the polysaccharide (12). 

The direct role of heparanase in cancer metastasis was 
demonstrated by two experimental systems. The murine T-lymphoma cell 
line Eb has no detectable heparanase activity. Whether the introduction of 
the hpa gene into Eb cells would confer a metastatic behavior on these 

10 cells was investigated. To this purpose, Eb cells were transfected with a 
full length human hpa cDNA. Stable transfected cells showed high 
expression of the heparanase mRNA and enzyme activity. These hpa and 
mock transfected Eb cells were injected subcutaneously into DBA/2 mice 
and mice were tested for survival time and liver metastases. All mice 

15 (n=20) injected with mock transfected cells survived during the first 4 
weeks of the experiment, while 50% mortality was observed in mice 
inoculated with Eb cells transfected with the hpa cDNA. The liver of mice 
inoculated with hpa transfected cells was infiltrated with numerous Eb 
lymphoma cells, as was evident both by macroscopic evaluation of the 

20 liver surface and microscopic examination of tissue sections. In contrast, 
metastatic lesions could not be detected by gross examination of the liver 
of mice inoculated with mock transfected control Eb cells. Few or no 
lymphoma cells were found to infiltrate the liver tissue. In a different 
model of tumor metastasis, transient transfection of the heparanase gene 

25 into low metastatic B16-F1 mouse melanoma cells followed by i.v. 
inoculation, resulted in a 4- to 5-fold increase in lung metastases. 

Finally, heparanase externally adhered to B16-F1 melanoma cells 
increased the level of lung metastases in C57BL mice as compared to 
control mice (see U.S. Pat. application No. 09/260,037, entitled 

30 INTRODUCING A BIOLOGICAL MATERIAL INTO A PATIENT, 
which is a continuation in part of U.S. Pat. application No. 09/140,888, 
and is incorporated herein by reference. 

Possible involvement of heparanase in tumor angiogenesis 
Fibroblast growth factors are a family of structurally related 

35 polypeptides characterized by high affinity to heparin (29). They are 
highly mitogenic for vascular endothelial cells and are among the most 
potent inducers of neovascularization (29-30). Basic fibroblast growth 
factor (bFGF) has been extracted from a subendothelial ECM produced in 
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vitro (31) and from basement membranes of the cornea (32), suggesting 
that ECM may serve as a reservoir for bFGF. Immunohistochemical 
staining revealed the localization of bFGF in basement membranes of 
diverse tissues and blood vessels (23). Despite the ubiquitous presence of 

5 bFGF in normal tissues, endothelial cell proliferation in these tissues is 
usually very low, suggesting that bFGF is somehow sequestered from its 
site of action. Studies on the interaction of bFGF with ECM revealed that 
bFGF binds to HSPG in the ECM and can be released in an active form by 
HS degrading enzymes (33, 32, 34). It was demonstrated that heparanase 

10 activity expressed by platelets, mast cells, neutrophils, and lymphoma cells 
is involved in release of active bFGF from ECM and basement membranes 
(35), suggesting that heparanase activity may not only function in cell 
migration and invasion, but may also elicit an indirect neovascular 
response. These results suggest that the ECM HSPG provides a natural 

15 storage depot for bFGF and possibly other heparin-binding growth 
promoting factors (36,37). Displacement of bFGF from its storage within 
basement membranes and ECM may therefore provide a novel mechanism 
for induction of neovascularization in normal and pathological situations. 

Recent studies indicate that heparin and HS are involved in binding 

20 of bFGF to high affinity cell surface receptors and in bFGF cell signaling 
(38, 39). Moreover, the size of HS required for optimal effect was similar 
to that of HS fragments released by heparanase (40). Similar results were 
obtained with vascular endothelial cells growth factor (VEGF) (41), 
suggesting the operation of a dual receptor mechanism involving HS in 

25 cell interaction with heparin-binding growth factors. It is therefore 
proposed that restriction of endothelial cell growth factors in ECM 
prevents their systemic action on the vascular endothelium, thus 
maintaining a very low rate of endothelial cells turnover and vessel 
growth. On the other hand, release of bFGF from storage in ECM as a 

30 complex with HS fragment, may elicit localized endothelial cell 
proliferation and neovascularization in processes such as wound healing, 
inflammation and tumor development (36,37). 

The involvement of heparanase in other physiological processes 
and its potential therapeutic applications 

35 Apart from its involvement in tumor cell metastasis, inflammation 

and autoimmunity, mammalian heparanase may be applied to modulate 
bioavailability of heparin-binding growth factors; cellular responses to 
heparin-binding growth factors (e.g., bFGF, VEGF) and cytokines (1L-8) 
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(44, 41); cell interaction with plasma lipoproteins (49); cellular 
susceptibility to certain viral and some bacterial and protozoa infections 
(45-47); and disintegration of amyloid plaques (48). 

Viral Infection: The presence of heparan sulfate on cell surfaces 

5 have been shown to be the principal requirement for the binding of Herpes 
Simplex (45) and Dengue (46) viruses to cells and for subsequent infection 
of the cells. Removal of the cell surface heparan sulfate by heparanase 
may therefore abolish virus infection. In fact, treatment of cells with 
bacterial heparitinase (degrading heparan sulfate) or heparinase (degrading 

10 heparan) reduced the binding of two related animal herpes viruses to cells 
and rendered the cells at least partially resistant to virus infection (45). 
There are some indications that the cell surface heparan sulfate is also 
involved in HIV infection (47). 

Neurodegenerative diseases: Heparan sulfate proteoglycans were 

15 identified in the prion protein amyloid plaques of Genstmann-Straussler 
Syndrome, Creutzfeldt-Jakob disease and Scrape (48). Heparanase may 
disintegrate these amyloid plaques which are also thought to play a role in 
the pathogenesis of Alzheimer's disease. 

Restenosis and Atherosclerosis: Proliferation of arterial smooth 

20 muscle cells (SMCs) in response to endothelial injury and accumulation of 
cholesterol rich lipoproteins are basic events in the pathogenesis of 
atherosclerosis and restenosis (50). Apart from its involvement in SMC 
proliferation as a low affinity receptor for heparin-binding growth factors, 
HS is also involved in lipoprotein binding, retention and uptake (51). It 

25 was demonstrated that HSPG and lipoprotein lipase participate in a novel 
catabolic pathway that may allow substantial cellular and interstitial 
accumulation of cholesterol rich lipoproteins (49). The latter pathway is 
expected to be highly atherogenic by promoting accumulation of apoB and 
apoE rich lipoproteins (e.g., LDL, VLDL, chylomicrons), independent of 

30 feed back inhibition by the cellular cholesterol content. Removal of SMC 
HS by heparanase is therefore expected to inhibit both SMC proliferation 
and lipid accumulation and thus may halt the progression of restenosis and 
atherosclerosis. 

Pulmonary diseases: 

35 The data obtained from the literature suggests a possible role for 

GAGs degrading enzymes, such as, but not limited to, heparanases, 
connective tissue activating peptide, heparinases, hyluronidases, sulfatases 
and chondroitinases, in reducing the viscosity of sinuses and airway 
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secretions with associated implications on curtailing the rate of infection 
and inflammation. The sputum from CF patients contains at least 3 % 
GAGs, thus contributing to its volume and viscous properties. 
Recombinant heparanase has been shown to reduce viscosity of sputum of 
5 CF patients (see, U.S. Pat. application No. 09/046,475). 

Jn summary, heparanase may thus prove useful for conditions such 
as wound healing, angiogenesis, restenosis, atherosclerosis, inflammation, 
neurodegenerative diseases and viral infections. Mammalian heparanase 
can be used to neutralize plasma heparin, as a potential replacement of 

10 protamine. Anti-heparanase antibodies may be applied for 
immunodetection and diagnosis of micrometastases, autoimmune lesions 
and renal failure in biopsy specimens, plasma samples, and body fluids. 

There is thus a widely recognized need for, and it would be highly 
advantageous to have, additional molecules with glycosyl hydrolase 

15 activity, because such molecules may exhibit greater specific activity 
toward certain substrates or different substrate specificity than the known 
heparanase. 

SUMMARY OF THE INVENTION 

20 According to one aspect of the present invention there is provided 

an isolated nucleic acid comprising a polynucleotide hybridizable with 
SEQ ID NOs:l, 4, 6 or portions thereof at 68 °C in 6 x SSC, 1 % SDS, 5 x 
Denharts, 10 % dextran sulfate, 100 |ig/ml salmon sperm DNA, and 32p 
labeled probe and wash at 68 °C with 3 x SSC and 0.1 % SDS. 

25 According to another aspect of the present invention there is 

provided an isolated nucleic acid comprising a polynucleotide hybridizable 
with SEQ ID NOs:l, 4, 6 or portions thereof at 68 °C in 6 x SSC, 1 % 
SDS, 5 x Denharts, 10 % dextran sulfate, 100 jig/ml salmon sperm DNA, 
and 32 p labeled probe and wash at 68 °C with 1 x SSC and 0. 1 % SDS. 

30 According to still another aspect of the present invention there is 

provided an isolated nucleic acid comprising a polynucleotide hybridizable 
with SEQ ID NOs:l, 4, 6 or portions thereof at 68 °C in 6 x SSC, 1 % 
SDS, 5 x Denharts, 10 % dextran sulfate, 100 jag/ml salmon sperm DNA, 
and 32 p labeled probe and wash at 68 °C with 0.1 x SSC and 0.1 % SDS. 

35 According to yet another aspect of the present invention there is 

provided an isolated nucleic acid comprising a polynucleotide at least 60 
% identical with SEQ ID NOs:l, 4, 6 or portions thereof as determined 
using the Bestfit procedure of the DNA sequence analysis software 
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package developed by the Genetic Computer Group (GCG) at the 
university of Wisconsin (gap creation penalty - 50, gap extension penalty - 
3). 

According to still another aspect of the present invention there is 

5 provided an isolated nucleic acid comprising a polynucleotide encoding a 
polypeptide being at least 60 % homologous with SEQ ID NOs:3, 5, 7 or 
portions thereof as determined using the Bestfit procedure of the DNA 
sequence analysis software package developed by the Genetic Computer 
Group (GCG) at the university of Wisconsin (gap creation penalty - 50, 

10 gap extension penalty - 3). 

According to further features in preferred embodiments of the 
invention described below, the polynucleotide is as set forth in SEQ ID 
NOs:l, 4, 6 or portions thereof. 

According to an additional aspect of the present invention there is 

1 5 provided a recombinant protein comprising a polypeptide encoded by the 
polynucleotides herein described. 

According to yet an additional aspect of the present invention there 
is provided a recombinant protein comprising a polypeptide at least 60 % 
homologous with SEQ ID NOs:3, 5, 7 or portions thereof as determined 

20 using the Bestfit procedure of the DNA sequence analysis software 
package developed by the Genetic Computer Group (GCG) at the 
university of Wisconsin (gap creation penalty - 50, gap extension penalty - 
3). 

According to further features in preferred embodiments of the 
25 invention described below, the polypeptide is as set fourth in SEQ ID 
NOs:3, 5, 7 or portions thereof. 

According to still an additional aspect of the present invention there 
is provided a nucleic acid construct comprising the isolated nucleic acid 
herein described. 

30 According to a further aspect of the present invention there is 

provided a nucleic acid construct comprising a polynucleotide encoding 
the recombinant protein herein described. 

According to still a further aspect of the present invention there is 
provided a host cell comprising a polynucleotide or construct and/or 
35 expressing a recombinant protein as herein described. 

According to yet a further aspect of the present invention there is 
provided an antisense oligonucleotide or nucleic acid construct comprising 
a polynucleotide or a polynucleotide analog of at least 10 bases being 
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hybridizable in vivo, under physiological conditions, with (i) a portion of 
a polynucleotide strand encoding a polypeptide at least 60 % homologous 
with SEQ ID NOs:3, 5, 7 or portions thereof as determined using the 
Bestfit procedure of the DNA sequence analysis software package 

5 developed by the Genetic Computer Group (GCG) at the university of 
Wisconsin (gap creation penalty - 50, gap extension penalty - 3); or (ii) a 
portion of a polynucleotide strand at least 60 % identical with SEQ ID 
NOs:l, 4, 6 or portions thereof as determined using the Bestfit procedure 
of the DNA sequence analysis software package developed by the Genetic 

10 Computer Group (GCG) at the university of Wisconsin (gap creation 
penalty - 50, gap extension penalty - 3). 

According to another aspect of the present invention there is 
provided a ribozyme comprising the antisense oligonucleotide herein 
described and a ribozyme sequence. 

15 The present invention provides polynucleotides and polypeptides 

belonging to a class of asp-glu glycosyl hydrolases of the GH-A clan, 
probably, based on homology to heparanase, GAG degrading enzymes. 

BRIEF DESCRIPTION OF THE DRAWINGS 
20 The invention is herein described, by way of example only, with 

reference to the accompanying drawings, wherein: 

FIG. 1 shows the nucleotide sequence (SEQ ID NOs:l-2) and the 

deduced amino acid sequence (SEQ ID NOs:2-3) of hnhpj; 

FIG. 2 is a comparison of the deduced amino acid sequences of 
25 hnhpl (SEQ ID NOs:2-3) and of heparanase (SEQ ID NO:9). Comparison 

was performed using the Gap program of the GCG package (gap creation 

penalty - 50, gap extension penalty - 3); 

FIG. 3 illustrates variability of hnhpl transcripts. Hnhpl was 

amplified from placenta and from testis marathon ready cDNA libraries, 
30 using the gene specific primers pn9-312u (SEQ ID NO: 14) and hn 11-230 

(SEQ ID NO: 11); 

FIG. 4 shows a zoo blot. Ten micrograms of genomic DNA from 

various species were digested with EcoRl and separated on 0.7 % agarose 

- TBE gel. Following electrophoresis, the gel was treated with HC1 and 
35 then with NaOH and the DNA fragments were downward transferred to a 

nylon membrane (Hybond N+, Amersham) with 0.4 N NaOH. The 

membrane was hybridized with a 1.7 Kb DNA probe that contained the 

hnhpl cDNA (clone pn9). Lane order: H - Human; M - Mouse; Rt - Rat; P 
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- Pig; Cw - Cow; Hr - Horse; S - Sheep; Rb - Rabbit; D - Dog; Ch - 
Chicken; F - Fish. Size markers (Lambda Bstell) are shown on the left; 

FIG. 5 illustrates cross hybridization between hpa and hnhpL Hpa 
was amplified by PCR from marathon ready placenta cDNA library. 
5 Hnhpl was amplified from testis marathon ready cDNA library. PCR 

products were run on agarose gel in duplicates and transferred to a nylon 

w 32 

membrane. One membrane was probed with p labeled hpa cDNA and 

the other with hnhpl, clone pn9. 

FIG. 6 is a comparison of the hydropathic profiles of heparanase 
10 and hnhpl. The curves were calculated according to the Kyte and Dulittle 

method over a window of 1 7 amino acids. 

FIG. 7 shows a Western blot analysis of recombinant hnhpl 

expressed in human embryonal kidney 293 cells. A - control heparanase- 

FLAG precursor, B-D - 293 cells trasfected with a control pSI vector (B), 
15 pSI-pn6 (C) and pSI-pn9 (D). Cell extracts were separated by SDS- 

PAGE, transferred onto lmmobilon-P nylon membrane (Millipore). 

Membrane was incubated with anti-FLAG Flag antibody 1:1000 (Kodak 

anti Flag M2 cat: IB 13025). 

20 DESCRIPTION OF THE PREFERRED EMBODIMENTS 

The present invention is of novel polynucleotides encoding 
polypeptides distantly homologous to heparanase, nucleic acid constructs 
including the polynucleotides, genetically modified cells expressing same, 
recombinant proteins encoded thereby and which may have heparanase or 

25 other glycosyl hydrolase activity, antibodies recognizing the recombinant 
proteins, oligonucleotides and oligonucleotide analogs derived from the 
polynucleotides and ribozymes including same. 

The principles and operation of the present invention may be better 
understood with reference to the drawings and accompanying descriptions. 

30 Before explaining at least one embodiment of the invention in 

detail, it is to be understood that the invention is not limited in its 
application to the details of construction and the arrangement of the 
components set forth in the following description or illustrated in the 
drawings. The invention is capable of other embodiments or of being 

35 practiced or carried out in various ways. Also, it is to be understood that 
the phraseology and terminology employed herein is for the purpose of 
description and should not be regarded as limiting. 
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While reducing the present invention to practice the human EST 
database was screened for homologous sequences using the entire amino 
acid sequence of human heparanase (SEQ ID NO:9). A distantly 
homologous fragment was pooled out, accession number AI222323, 

5 IMAGE clone number 1843155 from Soares_NFL_TJ3BC_Sl Homo 
Sapiens cDNA library prepared from testis B-cells and fetal lungs. The 
clone contained an insert of 560 bp (SEQ ID NO:23) of which the 3' 
region was homologous to the human hpa gene encoding human 
heparanase. Primers derived from the newly identified clone were used to 

10 isolate several cDNAs including several open reading frames which reflect 
in frame alternative splicing, the longest of which, pn6, appears in Figure 
1 (SEQ ID NOs:l, 2 and 3) is 2060 nucleotide long and it contains an open 
reading frame of 1776 nucleotides, which encodes a polypeptide of 592 
amino acids, with a calculated molecular weight of 66.5 kDa. The newly 

15 cloned gene was designated hnhpl. Two shorter forms, pn9 and pn5 and 
their deduced amino acid sequences are set forth in SEQ ID NOs:4 and 6 
and SEQ ID NO: 5 and 7, respectively, and are further described in the 
Examples section that follows. Comparison between the amino acid 
sequence of hnhpl and heparanase is shown in Figure 3. The homology 

20 between the two proteins is 52.8 % or 55.3 %, depending on the software 
employed. No cross hybridzation was detected between hpa and hnhpl, 
even under very moderate wash conditions (Figure 5). Zoo blot analysis 
demonstrated that the hnhpl gene and other related genes, perhaps forming 
a new gene familly, are present in genomes of other organisms including 

25 mammals and avians. The chromosome localization of hnhpl was 
determined using G3 radiation hybrid panel to be on human chromosome 
10, next to the marker SHGC-57721. The results also indicated a 
possibility of a second copy of the gene or of a related gene. The hnhpl 
gene is expressed in low levels in lymph nodes, spleen, colon and ovary; in 

30 slightly higher levels in prostate and small intestine; and in yet more 
pronouced level in testis. No expression was detected under the assay 
employed in bone marrow, liver, thymus, tonsil or leukocytes. Screening 
of the mouse EST database with the amino acid sequence of heparanase as 
well as of hnhpl pooled out a mouse EST clone (clone 1378452 accession 

35 number A10 19269 from mouse thymus, SEQ ID NO:8). However, this 
clone includes two frame shift mutations which hamper its open reading 
frame. 
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The overall homology between the amino acid sequence of hnhpl 
and heparanase suggest that these two proteins share similar function. The 
homology between the two proteins is concentrated at several regions. 
These may represent functional domains of the protein. The variability 

5 may suggest potential difference in substrate recognition, cellular 
localization and parameters of activity. 

Despite the lack of an overall homology between the heparanase 
and other glycosyl hydrolases, the amino acid couple asp-glu (NE, SEQ ID 
NO: 13), which is characteristic of the proton donor of glycosyl hydrolyses 

io of the GH-A clan, was found at positions 224, 225 of heparanase. As in 
other clan members, this NE couple is located at the end of a P strand. As 
shown in Figure 2, the region surrounding the NE couple is conserved in 
the predicted amino acid sequence of hnhpl. This suggests that hnhpl 
product is a glycosyl hydrolase. This definition may include any 

15 polysaccharide degrading enzyme, either exo or endo glycosidase and 
based on the similarity to heparanase it is likely that it encodes a GAG 
degrading enzyme. 

In addition, superimposition of the hydropathic profiles of 
heparanase and hnhpl (Figure 6) indicates an overlapping pattern along 

20 the proteins. The amino acid sequence characteristic of glycosyl 
hydrolases is located within a hydrophilic peak and at the same position in 
the aligned proteins. A remarkable difference in the hydropathic pattern is 
noticed around amino acids 157, 158 of heparanase, which constitute the 
processing site of the enzyme. While in heparanase, this site is located at 

25 the tip of a hydrophilic peak, the equivalent region of hnhpl is rather not 
hydrophilic. The peak around amino acid 1 10 of heparanase appears also, 
around amino acid 130 of hnhpl. Cleavage of heparanase at this region 
was shown to result in enzyme activation. The equivalent region of hnhpl 
might be a potential processing site. 

30 Heparanase has a potential signal peptide at the N-terminus of the 

67 kDa form. The homology between the two proteins is low at the N- 
termini and no signal peptide was identified in hnhpl polypeptide. 

According to one aspect of the present invention there is provided 
an isolated nucleic acid comprising a polynucleotide hybridizable with 

35 SEQ ID NOs:l, 4, 6 or portions thereof at 68 °C in 6 x SSC, 1 % SDS, 5 x 
Denharts, 10 % dextran sulfate, 100 jig/ml salmon sperm DNA, and 32 p 
labeled probe and wash at 68 °C with 3 x SSC, 1 x SSC or 0.1 x SSC and 
0.1 %SDS. 
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As used herein in the specification and in the claims section that 
follows, the term "portion" or "portions" refer to a consequtive stretch of 
nucleic or amino acids. Such a portion may include, for example, at least 
90 nucleotides (equivalent to at least 30 amino acids), at least 120 

5 nucleotides (equivalent to at least 40 amino acids), at least 150 nucleotides 
(equivalent to at least 50 amino acids), at least 1 80 nucleotides (equivalent 
to at least 60 amino acids), at least 210 nucleotides (equivalent to at least 
70 amino acids), at least 300 nucleotides (equivalent to at least 100 amino 
acids), at least 600 nucleotides (equivalent to at least 200 amino acids), at 

10 least 900 nucleotides (equivalent to at least 300 amino acids), at least 
1,200 nucleotides (equivalent to at least 400 amino acids), at least 1,500 
nucleotides (equivalent to at least 500 amino acids), or more. 

According to another aspect of the present invention there is 
provided an isolated nucleic acid comprising a polynucleotide at least 60 

15 %, preferably at least 65 %, more preferably at least 70 %, still preferably 
at least 75 %, yet preferably at least 80 %, more preferably at least 85 %, 
more preferably at least 90 %, most preferably at least 95 % - 100 %, 
identical with SEQ ID NOs:l, 4, 6 or portions thereof as determined using 
the Bestfit procedure of the DNA sequence analysis software package 

20 developed by the Genetic Computer Group (GCG) at the university of 
Wisconsin (gap creation penalty - 50, gap extension penalty - 3). 

According to still another aspect of the present invention there is 
provided an isolated nucleic acid comprising a polynucleotide encoding a 
polypeptide being at least 60 %, preferably at least 65 %, more preferably 

25 at least 70 %, still preferably at least 75 %, yet preferably at least 80 %, 
more preferably at least 85 %, more preferably at least 90 %, most 
preferably at least 95 % - 100 %, homologous with SEQ ID NOs:3, 5, 7 or 
portions thereof as determined using the Bestfit procedure of the DNA 
sequence analysis software package developed by the Genetic Computer 

30 Group (GCG) at the university of Wisconsin (gap creation penalty - 50, 
gap extension penalty - 3). 

As used herein in the specification and in the claims section that 
follows, the term "homologous" refers to identical + similar. 

According to an additional aspect of the present invention there is 

35 provided a recombinant protein comprising a polypeptide encoded by the 
polynucleotides herein described. 
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The necleic acid according to the present invention can be a 
complementary polynucleotide sequence, genomic polynucleotide 
sequence or a composite polynucleotide sequence. 

As used herein the phrase "complementary polynucleotide 
5 sequence" includes sequences which originally result from reverse 
transcription of messenger RNA using a reverse transcriptase or any other 
RNA dependent DNA polymerase. Such sequences can be subsequently 
simplified in vivo or in vitro using a DNA dependent DNA polymerase. 

As used herein the phrase "genomic polynucleotide sequence" 

10 includes sequences which originally derive from a chromosome and reflect 
a contiguous portion of a chromosome. 

As used herein the phrase "composite polynucleotide sequence" 
includes sequences which are at least partially complementary and at least 
partially genomic. A composite sequence can include some exonal 

15 sequences required to encode a polypeptide, as well as some intronic 
sequences interposing therebetween. The intronic sequences can be of any 
source, including of other genes, and typically will include conserved 
splicing signal sequences. Such intronic sequences may further include cis 
acting expression regulatory elements. 

20 Thus, this aspect of the present invention encompasses (i) 

polynucleotides as set forth in SEQ ID NOsrl, 4 and 6; (ii) fragments or 
portions thereof; (iii) sequences hybridizable therewith; (iv) sequences 
homologous thereto; (v) genomic and composite sequences coresponding 
thereto; (vi) sequences encoding similar polypeptides with different codon 

25 usage; and (vii) altered sequences characterized by mutations, such as 
deletion, insertion or substitution of one or more nucleotides, either 
naturally occurring or man induced, either randomly or in a targeted 
fashion. 

According to yet an additional aspect of the present invention there 
30 is provided a recombinant protein comprising a polypeptide at least 60 %, 
preferably at least 65 %, more preferably at least 70 %, still preferably at 
least 75 % 5 yet preferably at least 80 %, more preferably at least 85 %, 
more preferably at least 90 %, most preferably at least 95 % - 100 %, 
homologous with SEQ ID NOs:3, 5, 7 or portions thereof, as determined 
35 using the Bestfit procedure of the DNA sequence analysis software 
package developed by the Genetic Computer Group (GCG) at the 
university of Wisconsin (gap creation penalty - 50, gap extension penalty - 
3). 
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According to still an additional aspect of the present invention there 
is provided a nucleic acid construct comprising the isolated nucleic acid 
herein described. 

According to a preferred embodiment of the present invention the 

5 nucleic acid construct further comprising a promoter for regulating the 
expression of the isolated nucleic acid in a sense or antisense orientation. 
Such promoters are known to be exacting sequence elements required for 
transcription as they serve to bind DNA dependent RNA polymerase 
which transcribes sequences present downstream thereof. Such down 

10 stream sequences can be in either one of two possible orientations to result 
in the transcription of sense RNA which is translatable by the ribozyme 
machinery or antisense RNA which typically does not contain translatable 
sequences, yet can duplex or triplex with endogenous sequences, either 
mRNA or chromosomal DNA and hamper gene expression, all as further 

15 detailed hereinunder. 

While the isolated nucleic acid described herein is an essential 
element of the invention, it is modular and can be used in different 
contexts. The promoter of choice that is used in conjunction with this 
invention is of secondary importance, and will comprise any suitable 

20 promoter. It will be appreciated by one skilled in the art, however, that it 
is necessary to make sure that the transcription start site(s) will be located 
upstream of an open reading frame. In a preferred embodiment of the 
present invention, the promoter that is selected comprises an element that 
is active in the particular host cells of interest. These elements may be 

25 selected from transcriptional regulators that activate the transcription of 
genes essential for the survival of these cells in conditions of stress or 
starvation, including, but not limited to, the heat shock proteins. 

A construct according to the present invention preferably further 
includes an appropriate selectable marker. In a more preferred 

30 embodiment according to the present invention the construct further 
includes an origin of replication. In another most preferred embodiment 
according to the present invention the construct is a shuttle vector, which 
can propagate both in E. coli (wherein the construct comprises an 
appropriate selectable marker and origin of replication) and be compatible 

35 for propagation in cells, or integration in the genome, of an organism of 
choice. The construct according to this aspect of the present invention can 
be, for example, a plasmid, a bacmid, a phagemid, a cosmid, a phage, a 
virus or an artificial chromosome. 



WO 01/00643 



PCT/I LOO/00358 



20 

Alternatively, the nucleic acid construct according to this aspect of 
the present invention further includes a positive and a negative selection 
markers and may therefore be employed for selecting for homologous 
recombination events, including, but not limited to, homologous 

5 recombination employed in knock-in and knock-out procedures. One 
ordinarily skilled in the art can readily design a knock-out or knock-in 
constructs including both positive and negative selection genes for 
efficiently selecting transfected embryonic stem cells that underwent a 
homologous recombination event with the construct. Such cells can be 

10 introduced into developing embryos to generate chimeras, the offspring 
thereof can be tested for carrying the knock-out or knock-in constructs. 
Knock-out and/or knock-in constructs according to the present invention 
can be used to further investigate the functionality of the new gene. Such 
constructs can also be used in somatic and/or germ cells gene therapy to 

15 destroy activity of a defective, gain of function allele or to replace the lack 
of activity of a silent allele in an organism, thereby to down or upregulate 
activity, as required. Further detail relating to the construction and use of 
knock-out and knock-in constructs can be found in Fukushige, S. and 
Ikeda, J.E.: Trapping of mammalian promoters by Cre-Iox site-specific 

20 recombination. DNA Res 3 (1996) 73-80; Bedell, M.A., Jenkins, N.A. and 
Copeland, N.G.: Mouse models of human disease. Part I: Techniques and 
resources for genetic analysis in mice. Genes and Development 11 (1997) 
1-11; Bermingham, J J., Scherer, S.S., O'Connell, S., Arroyo, E., Kalla, 
K.A., Powell, F.L. and Rosenfeld, M.G.: Tst-l/Oct-6/SCIP regulates a 

25 unique step in peripheral myelination and is required for normal 
respiration. Genes Dev 10 (1996) 1751-62, which are incorporated herein 
by reference. 

According to yet another aspect of the present invention there is 
provided a host cell or animal comprising a nucleic acid construct or a 

30 portion thereof as described herein. Methods of transforming host cells, 
both prokaryotes and eukaryotes, and organisms with nucleic acid 
constructs and selection of transformants (e.g., transformed cells or 
transgenic animals) are well known to those of skills in the art. In 
addition, once transfected, such cells and organisms can be designed to 

35 direct the production of ample amounts of a recombinant protein which 
can then be purfied by known methods, including, but not limited to, 
various chromatography and gel electrophoresis methods. Such a purified 
recombinant protein can serve for elicitation of antibodies as further 
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detailed hereinunder. Methods of transformation of cells and organism are 
described in detail in reference 43, whereas methods of recombinant 
protein purification are described in detail in reference 52, both are 
incorporated herein by reference. 

5 According to still another aspect of the present invention there is 

provided an oligonucleotide of at least 17, at least 18, at least 19, at least 
20, at least 22, at least 25, at least 30 or at least 40, bases specifically 
hybridizable with the isolated nucleic acid described herein. 

Hybridization of shorter nucleic acids (below 200 bp in length, e.g. 

10 17-40 bp in length) is effected by stringent, moderate or mild 
hybridization, wherein stringent hybridization is effected by a 
hybridization solution of 6 x SSC and 1 % SDS or 3 M TMACI, 0.01 M 
sodium phosphate (pH 6.8), 1 mM EDTA (pH 7.6), 0.5 % SDS, 100 ^ig/ml 
denatured salmon sperm DNA and 0.1 % nonfat dried milk, hybridization 

15 temperature of 1 - 1.5 °C below the T m , final wash solution of 3 M 
TMACI, 0.01 M sodium phosphate (pH 6.8), 1 mM EDTA (pH 7.6), 0.5 % 
SDS at 1 - 1.5 °C below the T m ; moderate hybridization is effected by a 
hybridization solution of 6 x SSC and 0.1 % SDS or 3 M TMACI, 0.01 M 
sodium phosphate (pH 6.8), 1 mM EDTA (pH 7.6), 0.5 % SDS, 100 ^ig/ml 

20 denatured salmon sperm DNA and 0.1 % nonfat dried milk, hybridization 
temperature of 2 - 2.5 °C below the T m , final wash solution of 3 M 
TMACI, 0.01 M sodium phosphate (pH 6.8), 1 mM EDTA (pH 7.6), 0.5 % 
SDS at 1 - 1 .5 °C below the T m , final wash solution of 6 x SSC, and final 
wash at 22 °C; whereas mild hybridization is effected by a hybridization 

25 solution of 6 x SSC and 1 % SDS or 3 M TMACI, 0.01 M sodium 
phosphate (pH 6.8), 1 mM EDTA (pH 7.6), 0.5 % SDS, 100 fig/ml 
denatured salmon sperm DNA and 0.1 % nonfat dried milk, hybridization 
temperature of 37 °C, final wash solution of 6 x SSC and final wash at 22 
°C. 

30 According to an additional aspect of the present invention there is 

provided a pair of oligonucleotides each independently of at least 17, at 
least 18, at least 19, at least 20, at least 22, at least 25, at least 30 or at 
least 40 bases specifically hybridizable with the isolated nucleic acid 
described herein in an opposite orientation so as to direct exponential 

35 amplification of a portion thereof in a nucleic acid amplification reaction, 
such as a polymerase chain reaction. The polymerase chain reaction and 
other nucleic acid amplification reactions are well known in the art and 
require no further description herein. The pair of oligonucleotides 
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according to this aspect of the present invention are preferably selected to 
have compatible melting temperatures (Tm), e.g., melting temperatures 
which differ by less than that 7 °C, preferably less than 5 °C, more 
preferably less than 4 °C, most preferably less than 3 °C, ideally between 3 

5 °C and zero °C. Consequently, according to yet an additional aspect of 
the present invention there is provided a nucleic acid amplification product 
obtained using the pair of primers described herein. Such a nucleic acid 
amplification product can be isolated by gel electrophoresis or any other 
size based separation technique. Alternatively, such a nucleic acid 

10 amplification product can be isolated by affinity separation, either 
strandness affinity or sequence affinity. In addition, once isolated, such a 
product can be further genetically manipulated by restriction, ligation and 
the like, to serve any one of a plurality of applications associated with up 
and/or down regulation of activity. 

15 According to still an additional aspect of the present invention there 

is provided an antisense oligonucleotide comprising a polynucleotide or a 
polynucleotide analog of at least 10 bases, preferably between 10 and 15, 
more preferably between 50 and 20 bases, most preferably, at least 17, at 
least 18, at least 19, at least 20, at least 22, at least 25, at least 30 or at 

20 least 40 bases being hybridizable in v/vo, under physiological conditions, 
with (i) a portion of a polynucleotide strand encoding a polypeptide at least 
60 %, preferably at least 65 %, more preferably at least 70 %, still 
preferably at least 75 %, yet preferably at least 80 %, more preferably at 
least 85 %, more preferably at least 90 %, most preferably at least 95 % - 

25 100 % homologous to SEQ ID NOs:3, 5, 7 or portions thereof as 
determined using the as determined using the Bestfit procedure of the 
DNA sequence analysis software package developed by the Genetic 
Computer Group (GCG) at the university of Wisconsin (gap creation 
penalty - 50, gap extension penalty - 3); or (ii) a portion of a 

30 polynucleotide strand at least 60 %, preferably at least 65 %, more 
preferably at least 70 %, still preferably at least 75 %, yet preferably at 
least 80 %, more preferably at least 85 %, more preferably at least 90 %, 
most preferably at least 95 % - 100 % identical with SEQ ID NOsrl, 4, 6 
or portions thereof as determined using the Bestfit procedure of the DNA 

35 sequence analysis software package developed by the Genetic Computer 
Group (GCG) at the university of Wisconsin (gap creation penalty - 12, 
gap extension penalty - 4). 
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Such antisense oligonucleotides can be used to downregulate gene 
expression as further detailed hereinunder. Such an antisense 
oligonucleotide is readily synthesizable using solid phase oligonucleotide 
synthesis. 

5 The ability of chemically synthesizing oligonucleotides and analogs 

thereof having a selected predetermined sequence offers means for down 
modulating gene expression. Three types of gene expression modulation 
strategies may be considered. 

At the transcription level, antisense or sense oligonucleotides or 

io analogs that bind to the genomic DNA by strand displacement or the 
formation of a triple helix, may prevent transcription. At the transcript 
level, antisense oligonucleotides or analogs that bind target mRNA 
molecules lead to the enzymatic cleavage of the hybrid by intracellular 
RNase H. In this case, by hybridizing to the targeted mRNA, the 

15 oligonucleotides or oligonucleotide analogs provide a duplex hybrid 
recognized and destroyed by the RNase H enzyme. Alternatively, such 
hybrid formation may lead to interference with correct splicing. As a 
result, in both cases, the number of the target mRNA intact transcripts 
ready for translation is reduced or eliminated. At the translation level, 

20 antisense oligonucleotides or analogs that bind target mRNA molecules 
prevent, by steric hindrance, binding of essential translation factors 
(ribosomes), to the target rnRNA, a phenomenon known in the art as 
hybridization arrest, disabling the translation of such mRNAs. 

Thus, antisense sequences, which as described hereinabove may 

25 arrest the expression of any endogenous and/or exogenous gene depending 
on their specific sequence, attracted much attention by scientists and 
pharmacologists who were devoted at developing the antisense approach 
into a new pharmacological tool. 

For example, several antisense oligonucleotides have been shown to 

30 arrest hematopoietic cell proliferation, growth, entry into the S phase of 
the cell cycle, reduced survival and prevent receptor mediated responses. 

For efficient in vivo inhibition of gene expression using antisense 
oligonucleotides or analogs, the oligonucleotides or analogs must fulfill 
the following requirements (i) sufficient specificity in binding to the target 

35 sequence; (ii) solubility in water; (iii) stability against intra- and 
extracellular nucleases; (iv) capability of penetration through the cell 
membrane; and (v) when used to treat an organism, low toxicity. 
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Unmodified oligonucleotides are typically impractical for use as 
antisense sequences since they have short in vivo half-lives, during which 
they are degraded rapidly by nucleases. Furthermore, they are difficult to 
prepare in more than milligram quantities. In addition, such 
5 oligonucleotides are poor cell membrane penetraters. 

Thus it is apparent that in order to meet all the above listed 
requirements, oligonucleotide analogs need to be devised in a suitable 
manner. Therefore, an extensive search for modified oligonucleotides has 
been initiated. 

10 For example, problems arising in connection with double-stranded 

DNA (dsDNA) recognition through triple helix formation have been 
diminished by a clever "switch back" chemical linking, whereby a 
sequence of polypurine on one strand is recognized, and by "switching 
back", a homopurine sequence on the other strand can be recognized. 

15 Also, good helix formation has been obtained by using artificial bases, 
thereby improving binding conditions with regard to ionic strength and 
pH. 

In addition, in order to improve half-life as well as membrane 
penetration, a large number of variations in polynucleotide backbones 

20 have been done, nevertheless with little success. 

Oligonucleotides can be modified either in the base, the sugar or the 
phosphate moiety. These modifications include, for example, the use of 
methylphosphonates, monothiophosphates, dithiophosphates, 

phosphoramidates, phosphate esters, bridged phosphorothioates, bridged 

25 phosphoramidates, bridged methylenephosphonates, dephospho 
internucleotide analogs with siloxane bridges, carbonate bridges, 
carboxymethyl ester bridges, carbonate bridges, carboxymethyl ester 
bridges, acetamide bridges, carbamate bridges, thioether bridges, sulfoxy 
bridges, sulfono bridges, various "plastic" DNAs, a-anomeric bridges and 

30 borane derivatives. 

International patent application WO 89/12060 discloses various 
building blocks for synthesizing oligonucleotide analogs, as well as 
oligonucleotide analogs formed by joining such building blocks in a 
defined sequence. The building blocks may be either "rigid" (i.e., 

35 containing a ring structure) or "flexible" (i.e., lacking a ring structure). In 
both cases, the building blocks contain a hydroxy group and a mercapto 
group, through which the building blocks are said to join to form 
oligonucleotide analogs. The linking moiety in the oligonucleotide 
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analogs is selected from the group consisting of sulfide (-S-), sulfoxide (- 
SO-), and sulfone (-SO2-). 

International patent application WO 92/20702 describe an acyclic 
oligonucleotide which includes a peptide backbone on which any selected 

5 chemical nucleobases or analogs are stringed and serve as coding 
characters as they do in natural DNA or RNA. These new compounds, 
known as peptide nucleic acids (PNAs), are not only more stable in cells 
than their natural counterparts, but also bind natural DNA and RNA 50 to 
100 times more tightly than the natural nucleic acids cling to each other. 

10 PNA oligomers can be synthesized from the four protected monomers 
containing thymine, cytosine, adenine and guanine by Merrifield solid- 
phase peptide synthesis. In order to increase solubility in water and to 
prevent aggregation, a lysine amide group is placed at the C-terminal 
region and may be pegylated. 

15 Thus, antisense technology requires pairing of messenger RNA 

with an oligonucleotide to form a double helix that inhibits translation. 
The concept of antisense-mediated gene therapy was already introduced in 
1978 for cancer therapy. This approach was based on certain genes that 
are crucial in cell division and growth of cancer cells. Synthetic fragments 

20 of genetic substance DNA can achieve this goal. Such molecules bind to 
the targeted gene molecules in RNA of tumor cells, thereby inhibiting the 
translation of the genes and resulting in dysfunctional growth of these 
cells. Other mechanisms has also been proposed. These strategies have 
been used, with some success in treatment of cancers, as well as other 

25 illnesses, including viral and other infectious diseases. Antisense 
oligonucleotides are typically synthesized in lengths of 13-30 nucleotides. 
The life span of oligonucleotide molecules in blood is rather short. Thus, 
they have to be chemically modified to prevent destruction by ubiquitous 
nucleases present in the body. Phosphorothioates are very widely used 

30 modification in antisense oligonucleotide ongoing clinical trials. A new 
generation of antisense molecules consist of hybrid antisense 
oligonucleotide with a central portion of synthetic DNA while four bases 
on each end have been modified with TO-methyl ribose to resemble RNA. 
In preclinical studies in laboratory animals, such compounds have 

35 demonstrated greater stability to metabolism in body tissues and an 
improved safety profile when compared with the first-generation 
unmodified phosphorothioate. Dosens of other nucleotide analogs have 
also been tested in antisense technology. 
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RNA oligonucleotides may also be used for antisense inhibition as 
they form a stable RNA-RNA duplex with the target, suggesting efficient 
inhibition. However, due to their low stability RNA oligonucleotides are 
typically expressed inside the cells using vectors designed for this purpose. 

5 This approach is favored when attempting to target a mRNA that encodes 
an abundant and long-lived protein. 

Recent scientific publications have validated the efficacy of 
antisense compounds in animal models of hepatitis, cancers, coronary 
artery restenosis and other diseases. The first antisense drug was recently 

10 approved by the FDA. This drug Fomivirsen, developed by lsis, is 
indicated for local treatment of cytomegalovirus in patients with AIDS 
who are intolerant of or have a contraindication to other treatments for 
CMV retinitis or who were insufficiently responsive to previous treatments 
for CMV retinitis (Pharmacotherapy News Network). 

15 Several antisense compounds are now in clinical trials in the United 

States. These include locally administered antivirals, systemic cancer 
therapeutics. Antisense therapeutics has the potential to treat many life- 
threatening diseases with a number of advantages over traditional drugs. 
Traditional drugs intervene after a disease-causing protein is formed. 

20 Antisense therapeutics, however, block mRNA transcription/translation 
and intervene before a protein is formed, and since antisense therapeutics 
target only one specific mRNA, they should be more effective with fewer 
side effects than current protein-inhibiting therapy. 

A second option for disrupting gene expression at the level of 

25 transcription uses synthetic oligonucleotides capable of hybridizing with 
double stranded DNA. A triple helix is formed. Such oligonucleotides 
may prevent binding of transcription factors to the gene's promoter and 
therefore inhibit transcription. Alternatively, they may prevent duplex 
unwinding and, therefore, transcription of genes within the triple helical 

30 structure. 

Thus, according to a further aspect of the present invention there is 
provided a pharmaceutical composition comprising the antisense 
oligonucleotide described herein and a pharmaceutically acceptable 
carrier. The pharmaceutically acceptable carrier can be, for example, a 
35 liposome loaded with the antisense oligonucleotide. Formulations for 
topical administration may include, but are not limited to, lotions, 
ointments, gels, creams, suppositories, drops, liquids, sprays and powders. 
Conventional pharmaceutical carriers, aqueous, powder or oily bases, 



WO 01/00643 PCT/IL00/00358 

27 

thickeners and the like may be necessary or desirable. Compositions for 
oral administration include powders or granules, suspensions or solutions 
in water or non-aqueous media, sachets, capsules or tablets. Thickeners, 
diluents, flavorings, dispersing aids, emulsifiers or binders may be 

5 desirable. Formulations for parenteral administration may include, but are 
not limited to, sterile aqueous solutions which may also contain buffers, 
diluents and other suitable additives. 

According to still a further aspect of the present invention there is 
provided a ribozyme comprising the antisense oligonucleotide described 

10 herein and a ribozyme sequence fused thereto. Such a ribozyme is readily 
synthesizable using solid phase oligonucleotide synthesis. 

Ribozymes are being increasingly used for the sequence-specific 
inhibition of gene expression by the cleavage of mRNAs encoding 
proteins of interest. The possibility of designing ribozymes to cleave any 

15 specific target RNA has rendered them valuable tools in both basic 
research and therapeutic applications. In the therapeutics area, ribozymes 
have been exploited to target viral RNAs in infectious diseases, dominant 
oncogenes in cancers and specific somatic mutations in genetic disorders. 
Most notably, several ribozyme gene therapy protocols for HIV patients 

20 are already in Phase 1 trials. More recently, ribozymes have been used for 
transgenic animal research, gene target validation and pathway elucidation. 
Several ribozymes are in various stages of clinical trials. ANGIOZYME 
was the first chemically synthesized ribozyme to be studied in human 
clinical trials. ANGIOZYME specifically inhibits formation of the VEGF- 

25 r (Vascular Endothelial Growth Factor receptor), a key component in the 
angiogenesis pathway. Ribozyme Pharmaceuticals, Inc., as well as other 
firms have demonstrated the importance of anti-angiogenesis therapeutics 
in animal models. HEPTAZYME, a ribozyme designed to selectively 
destroy Hepatitis C Virus (HCV) RNA, was found effective in decreasing 

30 Hepatitis C viral RNA in cell culture assays (Ribozyme Pharmaceuticals, 
Incorporated - WEB home page). 

According to still another aspect of the present invention there is 
provided an antibody comprising an immunoglobulin specifically 
recognizing and binding a polypeptide at least 60 %, preferably at least 65 

35 %, more preferably at least 70 %, still preferably at least 75 %, yet 
preferably at least 80 %, more preferably at least 85 %, more preferably at 
least 90 %, most preferably at least 95 % - 100 % homologous (identical + 
similar) to SEQ ID NOs:3, 5, 7 or portions thereof using as determined 
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using the Bestfit procedure of the DNA sequence analysis software 
package developed by the Genetic Computer Group (GCG) at the 
university of Wisconsin (gap creation penalty - 50, gap extension penalty - 
3). According to a preferred embodiment of this aspect of the present 
5 invention the antibody specifically recognizing and binding the 
polypeptides set forth in SEQ ID NOs:3, 5, 7 or portions thereof. 

The present invention can utilize serum immunoglobulins, 
polyclonal antibodies or fragments thereof, (i.e., immunoreactive 
derivative of an antibody), or monoclonal antibodies or fragments thereof. 

10 Monoclonal antibodies or purified fragments of the monoclonal antibodies 
having at least a portion of an antigen binding region, including such as 
Fv, F(abl)2, Fab fragments (Harlow and Lane, 1988 Antibody, Cold 
Spring Harbor), single chain antibodies (U.S. Patent 4,946,778), chimeric 
or humanized antibodies and complementarily determining regions (CDR) 

is may be prepared by conventional procedures. Purification of these serum 
immunoglobulins antibodies or fragments can be accomplished by a 
variety of methods known to those of skill including, precipitation by 
ammonium sulfate or sodium sulfate followed by dialysis against saline, 
ion exchange chromatography, affinity or immunoaffinity chromatography 

20 as well as gel filtration, zone electrophoresis, etc. (see Goding in, 
Monoclonal Antibodies: Principles and Practice, 2nd ed., pp. 104-126, 
1986, Orlando, Fla., Academic Press). Under normal physiological 
conditions antibodies are found in plasma and other body fluids and in the 
membrane of certain cells and are produced by lymphocytes of the type 

25 denoted B cells or their functional equivalent. Antibodies of the IgG class 
are made up of four polypeptide chains linked together by disulfide bonds. 
The four chains of intact IgG molecules are two identical heavy chains 
referred to as H-chains and two identical light chains referred to as L- 
chains. Additional classes includes IgD, IgE, IgA, IgM and related 

30 proteins. 

Methods for the generation and selection of monoclonal antibodies 
are well known in the art, as summarized for example in reviews such as 
Tramontano and Schloeder, Methods in Enzymology 178, 551-568, 1989. 
A recombinant protein of the present invention may be used to generate 
35 antibodies in vitro. More preferably, the recombinant protein of the 
present invention is used to elicit antibodies in vivo. In general, a suitable 
host animal is immunized with the recombinant protein of the present 
invention. Advantageously, the animal host used is a mouse of an inbred 
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strain. Animals are typically immunized with a mixture comprising a 
solution of the recombinant protein of the present invention in a 
physiologically acceptable vehicle, and any suitable adjuvant, which 
achieves an enhanced immune response to the immunogen. By way of 

5 example, the primary immunization conveniently may be accomplished 
with a mixture of a solution of the recombinant protein of the present 
invention and Freund's complete adjuvant, said mixture being prepared in 
the form of a water in oil emulsion. Typically the immunization may be 
administered to the animals intramuscularly, intradermal ly, 

10 subcutaneously, intraperitoneal ly, into the footpads, or by any appropriate 
route of administration. The immunization schedule of the immunogen 
may be adapted as required, but customarily involves several subsequent 
or secondary immunizations using a milder adjuvant such as Freund's 
incomplete adjuvant. Antibody titers and specificity of binding to the 

15 recombinant protein can be determined during the immunization schedule 
by any convenient method including by way of example 
radioimmunoassay, or enzyme linked immunosorbant assay, which is 
known as the ELISA assay. When suitable antibody titers are achieved, 
antibody producing lymphocytes from the immunized animals are 

20 obtained, and these are cultured, selected and cloned, as is known in the 
art. Typically, lymphocytes may be obtained in large numbers from the 
spleens of immunized animals, but they may also be retrieved from the 
circulation, the lymph nodes or other lymphoid organs. Lymphocytes are 
then fused with any suitable myeloma cell line, to yield hybridomas, as is 

25 well known in the art. Alternatively, lymphocytes may also be stimulated 
to grow in culture, and may be immortalized by methods known in the art 
including the exposure of these lymphocytes to a virus, a chemical or a 
nucleic acid such as an oncogene, according to established protocols. 
After fusion, the hybridomas are cultured under suitable culture 

30 conditions, for example in multiwell plates, and the culture supernatants 
are screened to identify cultures containing antibodies that recognize the 
hapten of choice. Hybridomas that secrete antibodies that recognize the 
recombinant protein of the present invention are cloned by limiting 
dilution and expanded, under appropriate culture conditions. Monoclonal 

35 antibodies are purified and characterized in terms of immunoglobulin type 
and binding affinity. 
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Additional objects, advantages, and novel features of the present 
invention will become apparent to one ordinarily skilled in the art upon 
examination of the following examples, which are not intended to be 
limiting. Additionally, each of the various embodiments and aspects of the 
5 present invention as delineated hereinabove and as claimed in the claims 
section below finds experimental support in the following examples. 

EXAMPLES 

10 Reference is now made to the following examples, which together 

with the above descriptions, illustrate the invention in a non limiting 
fashion. 

Generally, the nomenclature used herein and the laboratory 
procedures in recombinant DNA technology described below are those 

15 well known and commonly employed in the art. Standard techniques are 
used for cloning, DNA and RNA isolation, amplification and purification. 
Generally enzymatic reactions involving DNA ligase, DNA polymerase, 
restriction endonucleases and the like are performed according to the 
manufacturers' specifications. These techniques and various other 

20 techniques are generally performed according to Sambrook et al., 
molecular Cloning - A Laboratory Manual, Cold Spring Harbor 
Laboratory, Cold Spring Harbor, N.Y. (1989), which is incorporated 
herein by reference. Other general references are provided throughout this 
document. The procedures therein are believed to be well known in the art 

25 and are provided for the convenience of the reader. All the information 
contained therein is incorporated herein by reference. 

Materials and Experimental Methods 
The following protocols and experimental details are referenced in 
the Examples that follow: 

30 Primers list: 



hnlll 16 


5M3GAGAGCAAGTCTGTGTTGATTC-3' 


(SEQ ID NO: 10) 


hnl!230 


5'-CACTGGTAGCCATGAGTGTGAG-3' 


(SEQlDNO:ll) 


hnlu350 


5'-TTGGTCATCCCTCCAGTCACCA-3' 


(SEQ ID NO: 12) 


pn9-312u 


5'-CTTGCCTGTAGACAGAGCTGCAG-3' 


(SEQ ID NO: 14) 


hpu-685 


5-GAGCAGCCAGGTGAGCCCAAGA-3' 


(SEQ IDNO:16) 


hpl967 


S'-TCAGATGCAAGCAGCAACTTTGGCXr 


(SEQ ID NO: 1 7) 


mnlul 18 


5'-CACCCTGATGTCATGCTGGAG-3' 


(SEQ ID NO: 18) 


mn 11563 


S'-CATCTAGGAGAGCAATGACGTTC-S' 


(SEQ ID NO: 19) 
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Apl 5*-CC ATCCTA ATA CG ACTC ACTATAG GG C-3 ' (SEQ ID NO:20) 

Ap2 5'-ACTCACTATAGGGCTCG AGCGGC-3' (SEQ ID NO:2 1 ) 

Southern analysis: 

Genomic DNA was extracted from animal or from human blood 
5 using Blood and cell culture DNA maxi kit (Qiagene). DNA was digested 
with £coRI, separated by gel electrophoresis and transferred to a nylon 
membrane Hybond N+ (Amersham). PCR products underwent a similar 
procedure. Hybridization was performed at 68° C in 6 x SSC, 1 % SDS, 5 
x Denharts, 10 % dextran sulfate, 100 ng/ml salmon sperm DNA, and 32p 
10 labeled probe. Pn9, a 1.7 kb fragment, which contain the entire open 
reading frame except for a deletion of 162 nucleotides (del:473-634, SEQ 
ID NO:l) was used as a probe. Following hybridization, the membrane 
was washed with 3 x SSC, 0.1 % SDS, at 68 °C and exposed to X-ray film 
for 3 days. Membranes were then washed with 0.1 x SSC, 0.1 % SDS, at 
15 68 °C and were re-exposed for 4 days. 
RT-PCR: 

RNA was prepared using TRI-Reagent (Molecular research center 
Inc.) according to the manufacturer instructions. 1 .25 fig were taken for 
reverse transcription reaction using SuperScriptll Reverse transcriptase 

20 (Gibco BRL) and Oligo (dT)is primer (SEQ ID NO:22), (Promega). 
Amplification of the resultant first strand cDNA was performed with Taq 
polymerase (Promega) or with Expand high fidelity (Boehringer 
Mannheim). 

cDNA Sequence analysis: 

25 Sequence determinations were performed with vector specific and 

gene specific primers, using an automated DNA sequencer (Applied 
Biosystems, model 373A). Each nucleotide was read from at least two 
independent primers. Computation and sequence analysis and alignments 
were done using the DNA sequence analysis software package developed 

30 by the Genetic Computer Group (GCG) at the university of Wisconsin. 
Alignments of two sequences were performed using Bestfit (gap creation 
penalty - 12, gap extension penalty - 4) or with Gap program (gap creation 
penalty - 50, gap extension penalty - 3). 
Tissue distribution: 

35 Tissue distribution of the hnhpl transcript was determined by semi- 

quantitative PCR. cDNA panels were obtained from Clontech. PCR was 
performed with the gene specific primers hnlu350 (SEQ ID NO: 12) and 
hnlll 16 (SEQ ID NO:10). PCR program was as follows: 94 °C, 3 
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minutes, followed by 40 cycles of 94 °C, 45 seconds, 64 °C, 1 minute, 72 
°C, 1 minute. Samples were taken for further analysis following 25, 30, 
35 and 40 cycles. 

Chromosome localization: 

5 Chromosome localization of hnhpl was performed using the 

radiation hybrid panel Stanford G3. This panel was provided by the 
human genome center at the Weizmann Institute. A 225 bp genomic 
fragment of hnhpl gene was amplified using the gene specific primers 
hnlu350 (SEQ ID NO:12) and hnllll6 (SEQ ID NO:10). PCR program 

10 was as follows: 94 °C, 3 minutes, followed by 39 cycles of 94 °C 45 
seconds, 64 °C, 1 minute, 72 °C, 1 min. Analysis of results was done 
through the RH server at the Stanford human genome center. 

EXAMPLE 1 

15 Cloning an EST for a novel heparanase gene 

The entire amino acid sequence of human heparanase (SEQ ID 
NO:9) was used to screen human EST database for homologous 
sequences. Screening was performed using the BLAST 2.0 server at the 
NCBI, basic BLAST search, tblastn program. 

20 A distantly homologous fragment was pooled out, accession 

number AI222323, IMAGE clone number 1843155 from 
Soares_NFL_T_GBC_Sl Homo Sapiens cDNA library prepared from 
testis B-cells and fetal lungs. The search values for this sequence were as 
follows: Score = 38.3 bits (87), Expect = 0.15 Identities = 16/36 (44 %), 

25 Positives = 22/36 (60 %). The sequence of accession number AI222323 
contains 378 nucleotides of the 3' of clone 1843155 (complementary to 
nucleotides 165-543 of SEQ IDNO:23). 

This clone was purchased from the IMAGE consortium. It 
contained an insert of 560 bp (SEQ ID NO:23). The entire nucleotide 

30 sequence was determined and compared to the hpa cDNA encoding 
human heparanase. The homology between clone 1843155 and hpa cDNA 
was restricted to the 3' region of the cDNA clone. There was 59 % 
homology between nucleotides 99-275 of clone 1843155 (SEQ ID 
NO:23), and 1532-1708 of hpa (SEQ ID NO:24). The deduced amino acid 

35 sequence of this region had 60 % homology (identical + similar) to amino 
acids 488-542 (SEQ ID NO:9) of human heparanase. The downstream 
sequence (nucleotides 276-560, SEQ ID NO:23) represents a 3' 
untranslated region and a poly A tail. The upstream sequence, nucleotides 
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1-98 (SEQ ID NO:23) was unrelated to heparanase. This unrelated 
sequence was found to be identical to a different cDNA clone from the 
same library. Therefore, the human EST clone 1843155, obtained from 
the IMAGE consortium is assumed to be a chimera, which contains two 
5 unrelated partial cDNAs ligated to a single vector. 

EXAMPLE 2 
Cloning a cDNA for a novel heparanase gene 
In order to isolate the entire cDNA, three primers were designed 

10 according to the sequence of clone 1843155. The cDNA was amplified 
from placenta cDNA by Marathon RACE (rapid amplification of cDNA 
ends) (Clontech, Palo Alto, California) according to the manufacturer 
instructions. The first cycle was performed with the gene specific primer 
hnlll 16 (SEQ ID NO:10) and the universal primer Apl (SEQ ID NO:20). 

15 The second cycle was performed with the gene specific primer hn 11230 
(SEQ ID NO:ll) and the universal primer Ap2 (SEQ ID NO:21). 
Following amplification, a difused band of approximately 1.7 kb was 
obtained. This cDNA amplification product was subcloned into pGEM T- 
easy (Promega, Madison, WI) and the nucleotide sequences of three 

20 independent clones pn5, pn6 and pn9 were determined. The consensus 
sequence of the longest cDNA, pn6, appears in Figure 1 (SEQ ID NOs:l, 2 
and 3). It is 2060 nucleotide long and it contains an open reading frame of 
1776 nucleotides, which encodes a polypeptide of 592 amino acids, with a 
calculated molecular weight of 66.5 kDa. The newly cloned gene was 

25 designated hnhpl. The two shorter forms, pn9 and pn5 and their deduced 
amino acid sequences are set forth in SEQ ID NOs:4 and 6 and SEQ ID 
NO:5 and 7, respectively. Pn9 and pn5 were identical to pn6, however 
each one of then contained an in frame deletion as a result of alternative 
splicing. Pn9 contains a deletion of 162 nucleotides, 473-634 of SEQ ID 

30 NO:l, which correspond to amino acids 150-203 of SEQ ID NO:3. As a 
result pn9 encodes a putative polypeptide of 538 amino acids (SEQ ID 
NO:5) having a calculated molecular weight of 60.4 kDa. Pn5 contains a 
deletion of 336 nucleotides, 473-808 of SEQ ID NO:l, which correspond 
to amino acids 150-261 of SEQ ID NO:3, thus, it encodes a putative 

35 polypeptides of 480 amino acids (SEQ ID NO:7) having a calculated 
molecular weight of 53.9 kDa. The 1 1 th amino acid residue of SEQ ID 
NO:3 is methionine. It is generally accepted that the first methionine 
serves as a translation start site in mammals, however, the nucleotides 
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surrounding the second ATG fit better with the Kozak consensus sequence 
for translation start site. Translation may thus start at the second 
methionine and produce a protein of 581 amino acids with calculated 
molecular weight of 65.4 kDa. The presence of transcripts of variable 
5 length was confirmed by PCR amplification of the hnlhp cDNA using two 
gene specific primers: pn9-312u (SEQ ID NO: 14) which is located close 
to the 5* end and hnll230 (SEQ ID NO:l 1) which overlaps the stop codon 
at the y end of the open reading frame. Amplification was performed 
from Marathon ready cDNA prepared from placenta and from testis. The 
10 PCR products are shown in figure 3. Four bands were obtained from 
placenta: two major bands of 1 .45 and 1 .6 kb, similar to pn9 and pn6 and 
two minor bands, one of 1.35 kb, similar to pn5 and a second one of 1.8 
kb. The sequence of the latter has not yet been determined. Amplification 
of testis cDNA resulted in a different pattern. Four bands of 1.35, 1.65, 
15 1.85 and 2.05 kb were observed and a minor one of 1.5 kb. The various 
forms appear to represent products of alternative splicing. Since the 
deletions characterized so far retain an open reading frame, the translation 
products of the various cDNAs may constitute a protein family. The 
comparison between the amino acid sequence of hnhpl and heparanase is 
20 shown in Figure 3. Using the gap program of the GCG package which 
aligns the entire amino acid sequences, the homology between the two 
proteins is 45.5 % identity and 7.3 % similarity, total homology of 52.8 % 
(gap creation penalty - 50, gap extension penalty - 3). The BestFit 
program defines the region of the best homology between the two 
25 sequences. Using this program, the homology between the two amino acid 
sequences starts at position 63 of hnlhp 1 (SEQ ID NO:3) and position 41 
of heparanase (SEQ ID NO:9) and is 47.5 % identity and 7.8 % similarity, 
i.e. homology of 55.3 %. The homology between the nucleotide sequences 
of hnhpl and hpa is 57 % as calculated by the BestFit program. The 
30 homologous region is located between nucleotides 638-1812 of hnhpl 
(SEQ ID NO:l) and nucleotides 564-1708 of hpa (SEQ ID NO:24). Using 
the Gap program the homology is 51 % over the entire sequence gap 
creation penalty - 50, gap extension penalty - 3. 

35 EXAMPLE 3 

Zoo blot 

Hnhpl cDNA was used as a probe to detect homologous sequences 
in human DNA and in DNA of various animals. The autoradiogram of the 
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Southern analysis is presented in Figure 4. Several bands were detected in 
human DNA. Several intense bands were detected in all mammals, while 
faint bands were detected in chicken. This correlates with the 
phylogenetic relation between human and the tested animals. The intense 
5 bands indicate that hnhpl is conserved among mammals as well as in more 
genetically distant organisms. The multiple bands patterns suggest that in 
all animals, hnhpl locus occupies a large genomic region. Several specific 
bands disappeared after stringent wash. These may represent homologous 
sequences and suggest the existence of a gene family, which can be 
io isolated based on their homology to the human hnhpl reported here. 

EXAMPLE 4 
comparison to heparanase via cross hybridization 

In order to check the capability of hpa and hnhpl to cross 

15 hybridize under low stringency conditions, the entire coding region of the 
human hpa and hnhpl were amplified by PCR. Human hpa was amplified 
from platelets mRNA by RT-PCR using the primers hpu-685 (SEQ ID 
NO: 16) and hpl967 (SEQ ID NO: 17), and hnhpl was amplified from testis 
using the primers hnll230 (SEQ ID NO:ll) and pn9-312u (SEQ ID 

20 NO: 14). The products were quantified and samples of 100 pg and 1 ng 
were run on agarose gel and subjected to Southern hybridization. The 
membranes were probed with ^p labeled hpa cDNA and with hnhpl 
cDNA. No cross hybridization was observed (Figure 5) even after over 
exposure for 5 days. Since hpa is the most similar sequence known today 

25 to that of hnhpl, this experiment indicates that the bands detected in the 
autoradiograph of Figure 4 are of the hnhpl gene or of yet unknown 
sequences homologous thereto, which might constitute a gene family. 
This further indicated that such sequences are isolatable using the hnhpl 
as a probe to screen the relevant libraries, or using hnhpl derived PCR 

30 primers to amplify the relevant cDNA or DNA sequences. 

EXAMPLE 5 
Chromosome localization 

The chromosome localization of hnhpl was determined using G3 
35 radiation hybrid panel. Hnhpl was amplified from 83 human/mouse 
radiation hybrids. The results were analyzed by the RH server and the 
hnhpl gene was mapped to chromosome 10, next to the marker SHGC- 



WO 01/00643 



PCT/ILOO/00358 



36 

57721. The results also indicated a possibility of a second copy of the 
gene. 

EXAMPLE 6 
Expression Pattern ofhnhpl 
The tissue distribution of hnhpl transcripts was determined using 
calibrated human cDNA panels (Clontech, Palo Alto, Ca). The results are 
shown in Table 1 below. Expression level is generally low. PCR products 
were clearly observed only after 40 cycles of amplification. 

TABLE 1 

Tissue 

Bone marrow 
Liver 

Lymph node 
Leukocytes 
Spleen 
Thymus 
Tonsil 
Colon 
Ovary 
Prostate 
Small intestine 
Testis 

EXAMPLE 7 
cloning of a Mouse homologue 
Screening of the mouse EST database with the amino acid sequence 
of heparanase as well as of hnhpl pooled out a mouse EST clone, which 
shares distant homology with heparanase and a remarkably high homology 
with hnhpl. The EST clone 1378452 accession number AI0 19269 from 
mouse thymus was 351 nucleotide long and it is set forth in SEQ ID NO:8. 
It has 61-63 % identity over 161 nucleotides (191-351, SEQ ID NO:8) to 
the human (SEQ ID NO:24) and mouse (SEQ ID NO: 15) hpa nucleotide 
sequences, and 93 % to hnhpl nucleotide sequence (SEQ ID NO:l) using 
the BestFit program of the GCG package. The nucleotide sequence of this 
clone did not contain an open reading frame. Two frame shifts were 
identified in the sequence found in the EST database, as compared to the 
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hnhpl sequence. This frame shifts were later confirmed by nucleotide 
sequence analysis of this clone as well as by isolation of this fragment 
from BL6 mouse melanoma cells and determination of its nucleotide 
sequence. This mouse gene is transcribed at very low levels. Low levels 

5 of expression were indicated as no amplification products were obtained 
following 40 cycles of PCR from mouse cDNA panel (Clontech, Palo 
Alto, Ca) which included cDNA from mouse heart, brain, spleen, lung, 
liver, skeletal muscle, kidney, testis and embryos of 7, 1 1,15, and 17 days. 
The amplification was performed using the gene specific primers mnlul 18 

10 (SEQ ID NO: 18) and mn 11563 (SEQ ID NO: 19). 

EXAMPLE 8 
Expression of hnhpl in mammalian cells 
A mammalian expression vector was constructed in order to over- 
15 express hnhpl in human cells. To enable detection of the Hnhpl 
translation product, the hnhpl expression vector was designed to encode a 
C-terminal tagged hnl protein. A DNA sequence, which encodes eight 
amino acids FLAG (Kodak), was fused to the V end of the hnhpl open 
reading frame. 

20 Fusion of the FLAG sequence to the hnhpl coding sequence was 

generated by PCR amplification using the primer: hnl -c- flag: 5'- 

A-3' (SEQ ID NO:25) and the primer: pn9-312u (SEQ ID NO:14). The 
PCR program was as follows: 94 °C, 3 min followed by 5 cycles of : 94 

25 °C, 45 seconds, 50 °C, 45 seconds and 72 °C, 2 minutes, and then 32 
cycles of 94 °C, 45 seconds, 64 °C, 45 seconds and 72 °C, 2 min. 

The amplification product was subcloned into pGEM-T-easy, and 
the sequence was verified. The resulting plasmids were designated pGEM- 
pn6F and pGEM-pn9F. 

30 Two constructs were generated in pSI mammalian expression 

vector (Promega): the first contained the complete hnhpl sequence (pn6) 
and the second contained the alternative splice form (pn9). The pSI-pn6 
expression vector was constructed by triple ligation of the following 
fragments: an EcoRI - BamHI fragment, which contains the 5' end of hnl - 

35 pn6, excised from pGem-T-easy-pn9, a BamHI - NotI fragment which 
contains the 3' FLAG tagged hnhpl, excised from pGEM-pn6F and pSI 
digested with EcoRI- NotI . 
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The pSI-pn9 expression vector was constructed similarly, by triple 
ligation of the following fragments: an EcoRI - Sspl fragment, which 
contains the 5' end of hnhpl-pn6, excised from pGem-T--easy-pn9, an 
Sspl - NotI fragment, which contains the 3' FLAG tagged hnhpl, excised 

5 from pGem-pn6F and pSI digested with EcoR I - Not I. 

The resulting plasmids were transfected into human embryonal 
kidney 293 cells, using the Fugene transfection reagent (Boehringer 
Mannheim). Forty-eight hours following transfection cells were harvested 
and proteins were analysed by western blot. Cell lysates of 2.5x1 0 5 were 

10 separated by SDS-PAGE, transferred onto a nylon membrane and 
incubated with anti FLAG antibody 1:1000 dilution (Kodak anti FLAG 
M2 cat: IB 13025, final concentration 10 jag/ml). Proteins of 
approximately 65 kDa and 60 kDa were detected in cells transfected with 
pSI-pn6F and pSI-pn9F respectively. These proteins are similar in size to 

15 those predicted by the calculated molecular weight for the translation 
products of corresponding open reading frames. It is demonstrated that 
both the entire hnhpl cDNA and the pn9 splice form are successfully 
transcribed and translated in human 293 cells. However, unlike 
heparanase the Hnhpl protein products do not undergo major processing 

20 in these cells. 

Although the invention has been described in conjunction with 
specific embodiments thereof, it is evident that many alternatives, 
modifications and variations will be apparent to those skilled in the art. 
Accordingly, it is intended to embrace all such alternatives, modifications 

25 and variations that fall within the spirit and broad scope of the appended 
claims. All publications cited herein are incorporated by reference in their 
entirety. 
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WHAT IS CLAIMED IS: 

1. An isolated nucleic acid comprising a polynucleotide 
hybridizable with SEQ ID NOsrl, 4, 6 or portions thereof at 68 °C in 6 x 
SSC, 1 % SDS, 5 x Denharts, 10 % dextran sulfate, 100 ^g/ml salmon 
sperm DNA, and 32p labeled probe and wash at 68 °C with 3 x SSC and 
0.1 %SDS. 

2. An isolated nucleic acid comprising a polynucleotide at least 
60 % identical with SEQ ID NOs:l, 4, 6 or portions thereof as determined 
using the Bestfit procedure of the DNA sequence analysis software 
package developed by the Genetic Computer Group (GCG) at the 
university of Wisconsin (gap creation penalty - 50, gap extension penalty - 
3). 

3. The isolated nucleic acid of claim 2, wherein said 
polynucleotide is as set forth in SEQ ID NOsrl, 4, 6 or portions thereof. 

4. An isolated nucleic acid comprising a polynucleotide 
encoding a polypeptide being at least 60 % homologous with SEQ ID 
NOs:3, 5, 7 or portions thereof as determined using the Bestfit procedure 
of the DNA sequence analysis software package developed by the Genetic 
Computer Group (GCG) at the university of Wisconsin (gap creation 
penalty - 50, gap extension penalty - 3). 

5. A recombinant protein comprising a polypeptide encoded by 
the polynucleotide of claim 1 . 

6. A recombinant protein comprising a polypeptide encoded by 
the polynucleotide of claim 2. 

1. A recombinant protein comprising a polypeptide encoded by 
the polynucleotide of claim 3. 

8. A recombinant protein comprising a polypeptide encoded by 
the polynucleotide of claim 4. 
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9. A recombinant protein comprising a polypeptide at least 60 
% homologous with SEQ ID NOs:3, 5, 7 or portions thereof as determined 
using the Bestfit procedure of the DNA sequence analysis software 
package developed by the Genetic Computer Group (GCG) at the 
university of Wisconsin (gap creation penalty - 50, gap extension penalty - 
3). 

10. The recombinant protein of claim 9, wherein said 
polypeptide is as set fourth in SEQ ID NOs:3, 5, 7 or portions thereof. 

11. A nucleic acid construct comprising the isolated nucleic acid 
of claim 1 . 



12. A nucleic acid construct comprising the isolated nucleic acid 
of claim 2. 



13. A nucleic acid construct comprising the isolated nucleic acid 
of claim 3. 



14. A nucleic acid construct comprising the isolated nucleic acid 
of claim 4. 



15. A host cell comprising the nucleic acid construct of claim 

11. 



16. A host cell comprising the nucleic acid construct of claim 

12. 



17. A host cell comprising the nucleic acid construct of claim 

13. 



18. A host cell comprising the nucleic acid construct of claim 

14. 



19. An antisense oligonucleotide comprising a polynucleotide or 
a polynucleotide analog of at least 10 bases being hybridizable in vivo, 
under physiological conditions, with: 
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(i) a portion of a polynucleotide strand encoding a polypeptide 
at least 60 % homologous with SEQ ID NOs:3, 5, 7 or 
portions thereof as determined using the Bestfit procedure of 
the DNA sequence analysis software package developed by 
the Genetic Computer Group (GCG) at the university of 
Wisconsin (gap creation penalty - 50, gap extension penalty - 
3); or 

(ii) a portion of a polynucleotide strand at least 60 % identical 
with SEQ ID NOs:l, 4, 6 or portions thereof as determined 
using the Bestfit procedure of the DNA sequence analysis 
software package developed by the Genetic Computer Group 
(GCG) at the university of Wisconsin (gap creation penalty - 
50, gap extension penalty - 3). 

20. A ribozyme comprising the antisense oligonucleotide of 
claim 19 and a ribozyme sequence. 

21. An antisense nucleic acid construct comprising a promoter 
sequence and a polynucleotide sequence directing the synthesis of an 
antisense RNA sequence of at least 10 bases being hybridizable in vivo, 
under physiological conditions, with: 

(i) a portion of a polynucleotide strand encoding a polypeptide 
at least 60 % homologous with SEQ ID NOs:3, 5, 7 or 
portions thereof as determined using the Bestfit procedure of 
the DNA sequence analysis software package developed by 
the Genetic Computer Group (GCG) at the university of 
Wisconsin (gap creation penalty - 50, gap extension penalty - 
3); or 

(ii) a portion of a polynucleotide strand at least 60 % identical 
with SEQ ID NOs:l, 4, 6 or portions thereof as determined 
using the Bestfit procedure of the DNA sequence analysis 
software package developed by the Genetic Computer Group 
(GCG) at the university of Wisconsin (gap creation penalty - 
50, gap extension penalty - 3). 
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CGCTTAATTCTAGAAGAGGGATTGA 25 

ATGAGGGTGCTTTGTGCCTTCCCTGAAGCCATGCCCTCCAGCAACTCCCGCCCCCCCGCG 6 5 

MRVLCAFPEAMPS SNSRPPA 

TGCCTAGCCCCGGGGGCTCTCTACTTGGCTCTGTTGCTCCATCTCTCCCTTTCCTCCCAG 14 5 
CLAPGALY LALLL H LSLSSQ 

GCTGGAGACAGGAGACCCTTGCCTGT AGACAGAGCTGCAGGTTTGAAGGAAAAGACCCTG 205 
AGDRRPLPVDRAAGLKEKTL 

AT T CT ACTT G ATGTG AGCACC AAG AACCC AGTC AGG AC AGTC AATGAG AACTTCCTCT CT 265 
ILLDVSTKNPVRTVNENFLS 

CTGCAGCTGGATCCGTCCATCATTCATGATGGCTGGCTCGATTTCCTAAGCTCCAAGCGC 325 
LQLDPSI I HDGWLDFLSSKR 

TT GGTG ACCCT GGCCCGGGG ACTTTCGCCCGCCTT T CT GCGCTTCGGGGGC AAAAGG ACC 385 
LVTLARGLSPAFLRFGGKRT 

GACTTCCTGCAGTTCCAGAACCTGAGGAACCCGGCGAAAAGCCGCGGGGGCCCGGGCCCG 4 45 
DFLQFQNLRN PAKSRGGPGP 

GATTACTATCTCAAAAACTATGAGGATGACATTGTTCGAAGTGATGTTGCCTTAGATAAA 505 
DYYLKNYE DDI VR S DVALDK 

C AGAAAGGCTGC AAGATTGCCCAGC ACCCTG AT GTT ATGCTGG AGCTCCAAAGGGAGAAG 565 
QKGCKIAQHPDVMLELQREK 

GCAGCTC AGATGCAT CT GGTTCT TCT AAAGGAGC AATTCT CCAAT ACT TACAGT AATCTC 625 
AAQMHLVLLKEQFSNTYSNL 

ATATTAACAGCCAGGTCTCTAGACAAACTTTATAACTTTGCTGATTGCTCTGGACTCCAC 68 5 
ILTARSLDKLYN FADCSGLH 

CTGAT ATT T GCTCT AAATGCACTGCGTCGT AATCCCAAT AACT CCTGG AACAGT TCTAGT 7 4 5 
LI FALNALRRN PN N SWN SSS 

GCCCTGAGTCTGTTGAAGT ACAGCGCCAGCAAAAAGT ACAACATTTCTTGGGAACTGGGT 805 
ALSLLKYSASKKY NI SWELG 

AATGAGCCAAATAACTATCGGACCATGCATGGCCGGGCAGTAAATGGCAGCCAGTTGGGA 865 
NEPNNYRTMHGRAVNGSQLG 

AAGGATTACATCCAGCTGAAGAGCCTGTTGCAGCCCATCCGGATTTATTCCAGAGCCAGC 92 5 

KDYIQLKSLLQPI RIYSRAS 

TTATATGGCCCTAATATTGGGCGGCCGAGGAAGAATGTCATCGCCCTCCTAGATGGATTC 985 
LYGPN IGRPRKNVIALLDGF 



ATGAAGGTGGC AGGAAGT ACAG T AGATGCAGT T ACCTGGC AACAT TGCT ACAT TGATGGC 1045 
MKVAGSTVDAVTWQHCY I DG 

CGGGTGGTCAAGGTGATGGACTTCCTGAAAACTCGCCTGTTAGACACACTCTCTGACCAG 1105 
RVVKVMDFLKTRLLDTLSDQ 

ATT AGGAAAATTCAGAAAGTGGTT AAT ACAT ACACTCCAGGAAAGAAGATTTGGCTTGAA 1165 
IRKIQKVVNTYT PGKKIWLE 

GGTGTGGTGACCACCTC^GCTGGAGGCACAAACAATCTATCCGATTCCTATGCTGCAGGA 1225 
GVVTTSAGGTNN LSDSYAAG 

TTCTTATGGTTGAACACTTTAGGAATGCTGGCCAATCAGGGCATTGATGTCGTGATACGG 1285 
FLWLNTLGMLAN QGI DVVIR 

CACTCATTTTTTGACCATGGATACAATCACCTCGTGGACCAGAATTTTAACCCATTACCA 134 5 
HSFFDHGYNHLVDQNFNPLP 

GACTACrrGGCTCTCTCTCCTCTACAAGCGCCTGATCGGCCCCAAAGTCTTGGCTGTGCAT 14 05 
DYWLSLI. YKRL I G PKVLAVH 



GTGGCTGGGCTCCAGCGGAAGCCACGGCCTGGCCGAGTGATCCGGGACAAACTAAGGATT 14 65 
VAGLQRKPRPGRVI RDKLRI 



WO 01/00643 PCT/IL00/00358 

2/8 



ATCATCAACTTGCATCGATCAAGAAAGAAAATCAAGCTGGCTGGGACTCTCAGAGACAAG 1585 
I INLHRSRKKI KLAGTLRDK 

CTGGTTCACCAGT ACCTGCTGCAGCCCT ATGGGCAGGAGGGCCTAAAGTCCAAGTC7K3TG 164 5 
LVHQYLLQPYGQEGLKSKSV 

CAACTGAATGGCCAGCCCTTAGTGATGGTGGACGACGGGACCCTCCCAGAATTGAAGCCC 1705 
QLNGQPLVMVDDGTLPELKP 

CGCCCCCTTCGGGCCGGCCGGACATTGGTCATCCCTCCAGTCACCATGGGCTTTTTTGTG 1765 
RPLRAGRTLVI PPVTMGFFV 

GTCAAGAATGTCAATGCTTTGGCCTGCCGCTACCGATAAGCTATCCTCACACTCATGGCT 1825 
VKNVNALACRYR* 

ACCAGTGGGCCTGCTGGGCTGCTTCCACTCCTCCACTCCAGT AGTATCCTCTGTTTTCAG 1885 

ACATCCTAGCAACCAGCCCCTGCTGCCCCATCCTGCTGGAATCAACACAGACTTGCTCTC 194 5 

CAAAGAGACTAAATGTCAT AGCGTGATCTTAGCCTAGGT AGGCCACATCCATCCCAAAGG 2005 

AAAATGTAGACATCACCTGTACCTAT ATAAGGATAAAGGCATGTGTAT AGAGCAA 2060 
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1 MRVLCAFPEAMPSSNSRPPACLAPGALYLALLLHLSLSSQAGDRRPLPVD 50 

I I I I I II. 

1 MLLRSKPALPPPLMLLLLGPLGPLSPGALP 30 

51 RAAGLKEKT LI LLDVSTKN P VRT VN EN FLS LQLDP S 1 1 H D . GWLD FLS S K 99 

II . . . : II 1.1. I - . I I I . : I • : I . I II 
31 RPA. .QAQDWDLDFFTQEPLHLVSPSFLSVTIDANLATDPRFLILLGSP 78 



• • • « • 

100 RLVTLARGLSPAFLRFGGKRTDFLQFQNLRNPAKSRGGPGPDYYLKNYED 149 

: I I I I I I I I I I : I II I I : I I I ( I -I I I : 

79 KLRTLARGLSPAYLRFGGTKTDFLIF. . . . DPKKESTFEERSYWQSQVNQ 124 

• • « » 

150 DIVRSDVALDKQKGCKIAQHPDVMLELQREKAAQMHLVLLKEQFSNTYSN 199 

II II I - I I . . I I : I : : I 

125 DI CKYGSIPPDVEEKLRLEWPYQEQLLLREHYQKKFKN 162 

• • . • • 

200 L I LTARS LDKL YN FADC S GLH L I FALNALRRN PNN S WN S S SAL S LL K Y S A 249 

. I.I II I I . I I I I III I I II I . I I I I . I II I - 
163 STYSRSSVDVLYTFANCSGLDLIFGLNALLRTADLQWNSSNAQLLLDYCS 212 

250 SKKYNISWELGNEPNNYRTMHGRAVNGSQLGKDYIQLKSLLQPIRIYSRA 299 

I I I I I I I I I I I I I I .: : I I I I I I - I : I I I II. : I 

213 SKGYNI SWELGNEPNSFLKKADI FINGSQLGEDFIQLHKLLRK . STFKNA 261 

300 SLYGPNIGRPRKNVIALLDGFMKVAGSTVDAVTWQHCYIDGRVVKVMDFL 349 

I I I I . : I . I I : : I I : I I : I - I M I I : - I I Ml 

262 KLYGPDVGQPRRKTAKMLKSFLKAGGEVTDSVTWHHYYLNGRTATREDFL 311 

350 KTRLLDTLSDQIRKIQKWNTYTPGKKIWLEGWTTSAGGTNNLSDSYAA 399 

.11 : . | : . | | . II I I : I I . II I I I . : I I 

312 NPDVLDIFISSVQKVFQWESTRPGKKVWLGETSSAYGGGAPLLSDTFAA 361 

400 GFLWLNTLGMLANQGI DWT RHS FFDHGYNHLVDQNFNPLPDYWL S LL YK 449 

I I : M . ||: | || : I I . I II I I I I I : I I . I I I I I I I I I I : I 
362 GFMWLDKLGLSARMGIEWMRQVFFGAGNYHLVDENFDPLPDYWLSLLFK 411 

450 RLIGPKVLAVHVAGLQRKPRPGRVTRDKLRIYAHCTNHHNHNYVRGSITL 499 

: I : I I I I I I . I : I I I : I I I II I I I : I I 

412 KLVGT KVLMAS VQGS KRR KLRVYLHCTNTDNPRYKEGDLTL 452 

500 FIINLHRSRKKIKLAGTLRDKLVHQYLLQPYGQEGLKSKSVQLNGQPLVM 549 

: I I I I I I . I I . I I I - I I I I I I II I I I I II 

4 53 YAINLHNVTKYLRLPYPFSNKQVDKYLLRPLGPHGLLSKSVQLNGLTLKM 502 



550 VDDGTLPELKPRPLRAGRTLVIPPVTMGFFVVKNVNALACRYR 

I I I I I I I : I I I I . I : I . I I I :: I II 
503 VDDQTLPPLMEKPLRPGSSLGLPAFSYSFFVTRNAKVAACI . 
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SEQUENCE LISTING 



(1) 



GENERAL INFORMATION: 



(i) 
<ii) 



Uii) 
(iv) 



(v) 



file 



(vi) 



(vii) 



(viii) 



(ix) 



APPLICANT: 

TITLE OF INVENTION: 



NUMBER OF SEQUENCES: 
CORRESPONDENCE ADDRESS: 



(A) 
(B) 
<C) 
(D) 
<E) 
<F) 



ADDRESSEE: 
STREET: 
CITY: 
STATE : 
COUNTRY : 
ZIP: 



COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: 

(B) COMPUTER: 

(C) OPERATING SYSTEM: 

(D) SOFTWARE: 



Iris Pecker et ai. 

POLYNUCLEOTIDES AND POLYPEPTIDES 
ENCODED THEREBY 

24 

Sol Sheinbein c/o Anthony Castorina 
2001 Jefferson Davis Highway, Suite 207 
Arlington 
Virginia 

United States of America 
22202 

1.44 megabyte, 3.5" microdisk 

Twinhead* Slimnote-890TX 

MS DOS version 6.2, 

Windows version 3.11 

Word for Windows version 2.0 

converted to an ASCI 



CURRENT APPLICATION DATA: 

(A) APPLICATION NUMBER: 

(B) FILING DATE: 

(C) CLASSIFICATION: 
PRIOR APPLICATION DATA: 

(A) APPLICATION NUMBER: 

(B) FILING DATE: 
ATTORNEY /AGENT INFORMATION: 

(A) NAME: 

(B) REGISTRATION NUMBER: 

(C) REFERENCE/ DOCKET NUMBER: 
TELECOMMUNICATION INFORMATION: 

(A) TELEPHONE: 

(B) TELEFAX : 

(C) TELEX: 



60/140,801 
June 25, 1999 



Sheinbein, Sol 

25, 457 

20105 

972-3-6127676 
972-3-6127575 



(2) 



INFORMATION FOR SEQ ID NO:l: 



(i) 



SEQUENCE CHARACTERISTICS: 



(A) LENGTH: 2060 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

<xi) SEQUENCE DESCRIPTION: SEQ ID NO:l: 

CGCTTAATTC TAGAAGAGGG ATTGAATGAG GGTGCTTTGT GCCTTCCCTG 50 

AAGCCATGCC CTCCAGCAAC TCCCGCCCCC CCGCGTGCCT AGCCCCGGGG 100 

GCTCTCTACT TGGCTCTGTT GCTCCATCTC TCCCTTTCCT CCCAGGCTGG 150 

AGACAGGAGA CCCTTGCCTG TAGACAGAGC TGCAGGTTTG AAGGAAAAGA 200 

CCCTGATTCT ACTTGATGTG AGCACCAAGA ACCCAGTCAG GACAGTCAAT 250 

GAGAACTTCC TCTCTCTGCA GCTGGATCCG TCCATCATTC ATGATGGCTG 300 

GCTCGATTTC CTAAGCTCCA AGCGCTTGGT GACCCTGGCC CGGGGACTTT 350 

CGCCCGCCTT TCTGCGCTTC GGGGGCAAAA GGACCGACTT CCTGCAGTTC 400 

CAGAACCTGA GGAACCCGGC GAAAAGCCGC GGGGGCC CGG GCCCGGATTA 450 

CTATCTCAAA AACTATGAGG ATGACATTGT TCGAAGTGAT GTTGCCTTAG 500 

ATAAACAGAA AGGCTGCAAG ATTGCCCAGC ACCCTGATGT TATGCTGGAG 550 

CTCCAAAGGG AGAAGGCAGC TC AG AT GCAT CTGGTTCTTC TAAAGGAGCA 600 

ATTCTCCAAT ACTTACAGTA ATCTCATATT AACAGCCAGG TCTCTAGACA 650 

AACTTTATAA CTTTGCTGAT TGCTCTGGAC TCCACCTGAT ATTTGCTCTA 700 

AATGCACTGC GTCGTAATCC CAATAACTCC TGGAACAGTT CTAGTGCCCT 7 50 

GAGTCTGTTG AAGTACAGCG CCAGCAAAAA GTACAACATT TCTTGGGAAC 800 

TGGGTAATGA GCCAAATAAC TATCGGACCA TGCATGGCCG GGCAGTAAAT 850 

GGCAGCCAGT TGGGAAAGGA TTACATCCAG CTGAAGAGCC TGTTGCAGCC 900 

CATCCGGATT TATTCCAGAG CCAGCTTATA TGGCCCTAAT ATTGGGCGGC 950 

CGAGGAAGAA TGTCATCGCC CTCCTAGATG GATTCATGAA GGT GGCAGG A 1000 

AGTACAGTAG ATGCAGTTAC CTGGCAACAT TGCTACATTG ATGGCCGGGT 1050 

GGTCAAGGTG ATGGACTTCC TGAAAACTCG CCTGTTAGAC ACACTCTCTG 1100 

ACCAGATTAG GAAAATTCAG AAAGTGGTTA ATACATACAC TCCAGGAAAG 1150 

AAGATTTGGC TTGAAGGTGT GGTGACCACC TCAGCTGGAG GCACAAACAA 1200 

TCTATCCGAT TCCTATGCTG CAGGATTCTT ATGGTTGAAC ACTTTAGGAA 1250 

TGCTGGCCAA TC AGGGC ATT GAT GTCGTGA TACGGCACTC ATTTTTTGAC 1300 

CATGGATACA ATCACCTCGT GGACCAGAAT TTTAACCCAT TACCAGACTA 1350 

CTGGCTCTCT CTCCTCTACA AGCGCCTGAT CGGCCCCAAA GTCTTGGCTG 14 00 

TGCATGTGGC TGGGCTCCAG CGGAAGCCAC GGCCTGGCCG AGTGATCCGG 14 50 

GACAAACTAA GGATTTATGC TCACTGCACA AACCACCACA ACCACAACTA 1500 

CGTTCGTGGG TCCATTACAC TTTTTATCAT CAACTTGCAT CGATCAAGAA 1550 

AGAAAATCAA GCTGGCTGGG ACTCTCAGAG ACAAGCTGGT TCACCAGTAC 1600 

CTGCTGCAGC CCTATGGGCA GGAGGGCCTA AAGTCCAAGT CAGTGCAACT 1650 

GAATGGCCAG CCCTTAGTGA TGGTGGACGA CGGGACCCTC CCAGAATTGA 1700 

AGCCCCGCCC CCTTCGGGCC GGCCGGACAT TGGTCATCCC TCCAGTCACC 17 50 

ATGGGCTTTT TTGTGGTCAA GAATGTCAAT GCTTTGGCCT GCCGCTACCG 1800 
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ATAAGCTATC CTCACACTCA TGGCTACCAG 
CACTCCTCCA CTCCAGTAGT ATCCTCTGTT 
GCCCCTGCTG CCCCATCCTG CTGGAATCAA 
AGACTAAATG TCATAGCGTG ATCTTAGCCT 
AAAGGAAAAT GTAGACATCA CCTGTACCTA 
TATAGAGCAA 



2 

TGGGCCTGCT GGGCTGCTTC 1850 

TTCAGACATC CTAGCAACCA 1900 

CACAGACTTG CTCTCCAAAG 1950 

AGGTAGGCCA CATCCATCCC 2000 

TATAAGGATA AAGGCATGTG 2050 
2060 



(2) INFORMATION FOR SEQ ID NO: 2: 

<i) SEQUENCE CHARACTERISTICS: 



(A) LENGTH: 2060 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 



txi) SEQUENCE DESCRIPTION: SEQ ID NO: 2: 















C 


GCT 


TAA 


TTC 


TAG 


AAG 


AGG 


GAT 


TGA 


25 


ATG 


AGG 


GTG 


CTT 


TGT 


GCC 


TTC 


CCT 


GAA 


GCC 


ATG 


CCC 


TCC 


AGC 


AAC 


70 


Met 


Arg 


Val 


Leu 


Cys 


Ala 


Phe 


Pro 


Glu 


Ala 


Met 


Pro 


Ser 


Ser 


Asn 










5 










10 










15 




TCC 


CGC 


CCC 


CCC 


GCG 


TGC 


CTA 


GCC 


CCG 


GGG 


GCT 


CTC 


TAC 


TTG 


GCT 


115 


Ser 


Arg 


Pro 


Pro 


Ala 


Cys 


Leu 


Ala 


Pro 


Gly 


Ala 


Leu 


Tyr 


Leu 


Ala 










20 










25 










30 




CTG 


TTG 


CTC 


CAT 


CTC 


TCC 


CTT 


TCC 


TCC 


CAG 


GCT 


GGA 


GAC 


AGG 


AGA 


160 


Leu 


Leu 


Leu 


His 


Leu 
35 


Ser 


Leu 


Ser 


Ser 


Gin 
40 


Ala 


Gly 


Asp 


Arg 


Arg 
45 




CCC 


TTG 


CCT 


GTA 


GAC 


AGA 


GCT 


GCA 


GGT 


TTG 


AAG 


GAA 


AAG 


ACC 


CTG 


205 


Pro 


Leu 


Pro 


Val 


Asp 


Arg 


Ala 


Ala 


Gly 


Leu 


Lys 


Glu 


Lys 


Thr 


Leu 












50 








55 










60 




ATT 


CTA 


CTT 


GAT 


GTG 


AGC 


ACC 


AAG 


AAC 


CCA 


GTC 


AGG 


ACA 


GTC 


AAT 


250 


He 


Leu 


Leu 


ASp 


val 


Ser 


Thr 


Lys 


Asn 


Pro 


val 


Arg 


Thr 


Val 


Asn 










65 










70 










75 




GAG 


AAC 


TTC 


CTC 


TCT 


CTG 


CAG 


CTG 


GAT 


CCG 


TCC 


ATC 


ATT 


CAT 


GAT 


295 


Glu 


Asn 


Phe 


Leu 


Ser 
80 


Leu 


Gin 


Leu 


Asp 


Pro 
85 


Ser 


He 


He 


His 


Asp 
90 




GGC 


TGG 


CTC 


GAT 


TTC 


CTA 


AGC 


TCC 


AAG 


CGC 


TTG 


GTG 


ACC 


CTG 


GCC 


340 


Gly 


Trp 


Leu 


Asp 


Phe 


Leu 


Ser 


Ser 


Lys 


Arg 


Leu 


val 


Thr 


Leu 


Ala 








95 










100 










105 




CGG 


GGA 


CTT 


TCG 


CCC 


GCC 


TTT 


CTG 


CGC 


TTC 


GGG 


GGC 


AAA 


AGG 


ACC 


385 


Arg 


Gly 


Leu 


Ser 


Pro 


Ala 


Phe 


Leu 


Arg 


Phe 


Gly Gly 


Lys 


Arg 


Thr 








110 










115 










120 




GAC 


TTC 


CTG 


CAG 


TTC 


CAG 


AAC 


CTG 


AGG 


AAC 


CCG 


GCG 


AAA 


AGC 


CGC 


4 30 


Asp 


Phe 


Leu 


Gin 


Phe 


Gin 


Asn 


Leu 


Arg 


Asn 


Pro 


Ala 


Lys 


Ser 


Arg 










125 










130 










135 




GGG 


GGC 


CCG 


GGC 


CCG 


GAT 


TAC 


TAT 


CTC 


AAA 


AAC 


TAT 


GAG 


GAT 


GAC 


47 5 


Gly Gly 


Pro Gly 


Pro Asp Tyr 


Tyr 


Leu 


Lys 


Asn Tyr 


Glu 


Asp 


Asp 












140 










145 










150 




ATT 


GTT 


CGA 


AGT 


GAT 


GTT 


GCC 


TTA 


GAT 


AAA 


CAG 


AAA 


GGC 


TGC 


AAG 


520 


He 


Val 


Arg 


Ser 


Asp 


Val 


Ala 


Leu 


ASp 


Lys 


Gin 


Lys 


Gly 


Cys 


Lys 










155 










160 










165 




ATT 


GCC 


CAG 


CAC 


CCT 


GAT 


GTT 


ATG 


CTG 


GAG 


CTC 


CAA 


AGG 


GAG 


AAG 


565 


He 


Ala 


Gin 


His 


Pro 
170 


Asp 


Val 


Met 


Leu 


Glu 
175 


Leu 


Gin 


Arg 


GlU 


Lys 
180 




GCA 


GCT 


CAG 


ATG 


CAT 


CTG 


GTT 


CTT 


CTA 


AAG 


GAG 


CAA 


TTC 


TCC 


AAT 


610 


Ala 


Ala 


Gin 


Met 


His 
185 


Leu 


Val 


Leu 


Leu 


Lys 
190 


Glu 


Gin 


Phe 


Ser 


Asn 
195 




ACT 


TAC 


AGT 


AAT 


CTC 


ATA 


TTA 


ACA 


GCC 


AGG 


TCT 


CTA 


GAC 


AAA 


CTT 


655 


Thr 


Tyr 


Ser 


Asn 


Leu 


He 


Leu 


Thr 


Ala 


Arg 


Ser 


Leu 


Asp 


Lys 


Leu 










200 










205 










210 




TAT 


AAC 


TTT 


GCT 


GAT 


TGC 


TCT 


GGA 


CTC 


CAC 


CTG 


ATA 


TTT 


GCT 


CTA 


700 


Tyr 


Asn 


Phe 


Ala 


Asp 


Cys 


Ser 


Gly 


Leu 


His 


Leu 


He 


Phe 


Ala 


Leu 










215 










220 










225 




AAT 


GCA 


CTG 


CGT 


CGT 


AAT 


CCC 


AAT 


AAC 


TCC 


TGG 


AAC 


AGT 


TCT 


AGT 


745 


Asn 


Ala 


Leu 


Arg Arg Asn 


Pro 


Asn 


Asn 


Ser 


Trp Asn 


Ser 


Ser 


Ser 












230 










235 










240 




GCC 


CTG 


AGT 


CTG 


TTG 


AAG 


TAC 


AGC 


GCC 


AGC 


AAA 


AAG 


TAC 


AAC 


ATT 


790 


Ala 


Leu 


Ser 


Leu 


Leu 


Lys 


Tyr 


Ser 


Ala 


Ser 


Lys 


Lys 


Tyr 


Asn 


He 












245 






250 










255 




TCT 


TGG 


GAA 


CTG 


GGT 


AAT 


GAG 


CCA 


AAT 


AAC 


TAT 


CGG 


ACC 


ATG 


CAT 


835 


Ser Trp Glu 


Leu Gly Asn Glu 


Pro 


Asn 


Asn 


Tyr Arg 


Thr 


Met 


His 












260 










265 










270 




GGC 


CGG 


GCA 


GTA 


AAT 


GGC 


AGC 


CAG 


TTG 


GGA 


AAG 


GAT 


TAC 


ATC 


CAG 


880 


Gly 


Arg 


Ala 


Val 


Asn 


Gly 


Ser 


Gin 


Leu 


Gly 


Lys 


Asp 


Tyr 


He 


Gin 








275 










280 










285 




CTG 


AAG 


AGC 


CTG 


TTG 


CAG 


CCC 


ATC 


CGG 


ATT 


TAT 


TCC 


AGA 


GCC 


AGC 


925 


Leu 


Lys 


Ser 


Leu 


Leu 


Gin 


Pro 


He 


Arg 


He 


Tyr 


Ser 


Arg 


Ala 


Ser 










290 










295 










300 




TTA 


TAT 


GGC 


CCT 


AAT 


ATT 


GGG 


CGG 


CCG 


AGG 


AAG 


AAT 


GTC 


ATC 


GCC 


970 


Leu 


Tyr 


Gly 


Pro 


Asn 
305 


He 


Gly 


Arg 


Pro 


Arg 
310 


Lys 


Asn 


val 


He 


Ala 

315 




CTC 


CTA 


GAT 


GGA 


TTC 


ATG 


AAG 


GTG 


GCA 


GGA 


AGT 


ACA 


GTA 


GAT 


GCA 


1015 


Leu 


Leu Asp Gly 


Phe 


Met 


Lys 


Val 


Ala 


Gly 


Ser 


Thr 


val 


Asp 


Ala 
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320 










325 










330 




GTT 


ACC 


TGG 


CAA 


CAT 


TGC 


TAC 


ATT 


GAT 


GGC 


CGG 


GTG 


GTC 


AAG 


GTG 


1060 


Val 


Thr 


Trp 


Gin 


His 


Cys 


Tyr 


He 


Asp Gly 


Arg 


Val 


Val 


Lys 


Val 












335 










340 










345 




ATG 


GAC 


TTC 


CTG 


AAA 


ACT 


CGC 


CTG 


TTA 


GAC 


ACA 


CTC 


TCT 


GAC 


CAG 


1105 


Met 


ASp 


Phe 


Leu 


Lys 
350 


Thr 


Arg 


Leu 


Leu 


Asp 
355 


Thr 


Leu 


Ser 


Asp 


Gin 
360 




ATT 


AGG 


AAA 


ATT 


CAG 


AAA 


GTG 


GTT 


AAT 


ACA 


TAC 


ACT 


CCA 


GGA 


AAG 


1150 


He 


Arg 


Lys 


He 


Gin 
365 


Lys 


Val 


Val 


Asn 


Thr 
370 


Tyr 


Thr 


Pro 


Gly 


Lys 
375 




AAG 


ATT 


TGG 


CTT 


GAA 


GGT 


GTG 


GTG 


ACC 


ACC 


TCA 


GCT 


GGA 


GGC 


ACA 


1195 


Lys 


He 


Trp 


Leu 


Glu 


Gly 


Val 


Val 


Thr 


Thr 


Ser 


Ala 


Gly 


Gly Thr 










380 










385 










390 




AAC 


AAT 


CTA 


TCC 


GAT 


TCC 


TAT 


GCT 


GCA 


GGA 


TTC 


TTA 


TGG 


TTG 


AAC 


1240 


Asn 


Asn 


Leu 


Ser 


Asp 
395 


Ser 


Tyr 


Ala 


Ala 


Gly 
400 


Phe 


Leu 


Trp 


Leu 


Asn 
405 




ACT 


TTA 


GGA 


ATG 


CTG 


GCC 


AAT 


CAG 


GGC 


ATT 


GAT 


GTC 


GTG 


ATA 


CGG 


1285 


Thr 


Leu 


Gly 


Met 


Leu 


Ala 


Asn 


Gin 


Gly 


He 


Asp 


Val 


Val 


He 


Arg 










410 










415 










420 




CAC 


TCA 


TTT 


TTT 


GAC 


CAT 


GGA 


TAC 


AAT 


CAC 


CTC 


GTG 


GAC 


CAG 


AAT 


1330 


His 


Ser 


Phe 


Phe 


Asp 


His 


Gly Tyr 


Asn 


His 


Leu 


Val 


Asp 


Gin 


Asn 












425 










430 










435 




TTT 


AAC 


CCA 


TTA 


CCA 


GAC 


TAC 


TGG 


CTC 


TCT 


CTC 


CTC 


TAC 


AAG 


CGC 


1375 


Phe 


Asn 


Pro 


Leu 


Pro 
440 


Asp 


Tyr 


Trp 


Leu 


Ser 
445 


Leu 


Leu 


Tyr 


Lys 


Arg 
450 




CTG 


ATC 


GGC 


CCC 


AAA 


GTC 


TTG 


GCT 


GTG 


CAT 


GTG 


GCT 


GGG 


CTC 


CAG 


1420 


Leu 


lie 


Gly 


Pro 


Lys 
455 


Val 


Leu 


Ala 


Val 


His 
460 


Val 


Ala 


Gly 


Leu 


Gin 
465 




CGG 


AAG 


CCA 


CGG 


CCT 


GGC 


CGA 


GTG 


ATC 


CGG 


GAC 


AAA 


CTA 


AGG 


ATT 


1465 


Arg 


Lys 


Pro 


Arg 


Pro Gly 


Arg 


Val 


He 


Arg 


Asp 


Lys 


Leu 


Arg 


He 












470 










475 










480 




TAT 


GCT 


CAC 


TGC 


ACA 


AAC 


CAC 


CAC 


AAC 


CAC 


AAC 


TAC 


GTT 


CGT 


GGG 


1510 


Tyr 


Ala 


His 


Cys 


Thr 


Asn 


His 


His 


Asn 


His 


Asn 


Tyr 


val 


Arg 


Gly 








485 










490 










495 




TCC 


ATT 


ACA 


CTT 


TTT 


ATC 


ATC 


AAC 


TTG 


CAT 


CGA 


TCA 


AGA 


AAG 


AAA 


1555 


Ser 


He 


Thr 


Leu 


Phe 
500 


He 


He 


Asn 


Leu 


His 
505 


Arg 


Ser 


Arg 


Lys 


Lys 
510 




ATC 


AAG 


CTG 


GCT 


GGG 


ACT 


CTC 


AGA 


GAC 


AAG 


CTG 


GTT 


CAC 


CAG 


TAC 


1600 


He 


Lys 


Leu 


Ala 


Gly Thr 


Leu 


Arg 


Asp 


Lys 


Leu 


Val 


His 


Gin 


Tyr 












515 










520 










525 




CTG 


CTG 


CAG 


CCC 


TAT 


GGG 


CAG 


GAG 


GGC 


CTA 


AAG 


TCC 


AAG 


TCA 


GTG 


1645 


Leu 


Leu 


Gin 


Pro 


Tyr 
530 


Gly 


Gin 


Glu 


Gly 


Leu 
535 


Lys 


Ser 


Lys 


Ser 


val 
540 




CAA 


CTG 


AAT 


GGC 


CAG 


CCC 


TTA 


GTG 


ATG 


GTG 


GAC 


GAC 


GGG 


ACC 


CTC 


1690 


Gin 


Leu 


Asn 


Gly 


Gin 


Pro 


Leu 


Val 


Met 


Val 


Asp 


Asp 


Gly Thr 


Leu 










545 










550 










555 




CCA 


GAA 


TTG 


AAG 


CCC 


CGC 


CCC 


CTT 


CGG 


GCC 


GGC 


CGG 


ACA 


TTG 


GTC 


1735 


Pro 


Glu 


Leu 


Lys 


Pro 
560 


Arg 


Pro 


Leu 


Arg 


Ala 
565 


Gly 


Arg 


Thr 


Leu 


val 
570 




ATC 


CCT 


CCA 


GTC 


ACC 


ATG 


GGC 


TTT 


TTT 


GTG 


GTC 


AAG 


AAT 


GTC 


AAT 


1780 


He 


Pro 


Pro 


val 


Thr 


Met 


Gly 


Phe 


Phe 


Val 


Val 


Lys 


Asn 


Val 


Asn 












575 








580 










585 




GCT 


TTG 


GCC 


TGC 


CGC 


TAC 


CGA 


TAA 


GCT 


ATC 


CTC 


ACA 


CTC 


ATG 


GCT 


1825 


Ala 


Leu 


Ala 


Cys 


Arg Tyr 


Arg 




























590 
























ACC 


AGT 


GGG 


CCT 


GCT 


GGG 


CTG 


CTT 


CCA 


CTC 


CTC 


CAC 


TCC 


AGT 


AGT 


1870 


ATC 


CTC 


TGT 


TTT 


CAG 


ACA 


TCC 


TAG 


CAA 


CCA 


GCC 


CCT 


GCT 


GCC 


CCA 


1915 


TCC 


TGC 


TGG 


AAT 


CAA 


CAC 


AGA 


CTT 


GCT 


CTC 


CAA 


AGA 


GAC 


TAA 


ATG 


1960 


TCA 


TAG 


CGT 


GAT 


CTT 


AGC 


CTA 


GGT 


AGG 


CCA 


CAT 


CCA 


TCC 


CAA 


AGG 


2005 


AAA 


ATG 


TAG 


ACA 


TCA 


CCT 


GTA 


CCT 


ATA 


TAA 


GGA 


TAA 


AGG 


CAT 


GTG 


2050 


TAT 


AGA 


GCA 


A 
























2060 



(2) INFORMATION FOR SEQ ID NO: 3: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 592 

(B) TYPE: amino acid 









(C) 


STRANDEDNESS : 


single 
















( D) 


TOPOLOGY: 


linear 














(xi) 


SEQUENCE 


DESCRIPTION: 


: SEQ ID 


NO: 3: 






Met 


Arg 


Val 


Leu Cys 


Ala 


Phe Pro Glu 


Ala Met 


Pro 


Ser 


Ser 


Asn 






5 






10 








15 


Ser Arg 


Pro 


Pro Ala 


Cys 


Leu Ala Pro 


Gly Ala 


Leu 


Tyr 


Leu 


Ala 








20 




25 








30 


Leu 


Leu 


Leu 


His Leu 


Ser 


Leu Ser Ser 


Gin Ala 


Gly 


Asp Arg 


Arg 








35 






40 








45 


Pro 


Leu 


Pro 


Val Asp 


Arg 


Ala Ala Gly 


Leu Lys 


Glu 


Lys 


Thr 


Leu 








50 




55 








60 


He 


Leu 


Leu 


Asp val 


Ser 


Thr Lys Asn 


Pro val 


Arg 


Thr 


val 


Asn 








65 






70 








75 


Glu 


Asn 


Phe 


Leu Ser 


Leu 


Gin Leu Asp 


Pro Ser 


He 


He 


His 


Asp 
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4 











80 










85 










90 


Gly Trp 


Leu 


Asp 


Phe 


Leu 


Ser 


Ser 


Lys 


Arg 


Leu 


Val 


Thr 


Leu 


Ala 










95 










100 










105 


Arg 


Gly 


Leu 


Ser 


Pro 


Ala 


Phe 


u 


Arg 


Phe 


Gly Gly 


Lys 


Arg 


Thr 










110 










115 










120 


Asp 


Phe 


Leu 


Gin 


Phe 


Gin 


Asn 


Leu 


Arg 


Asn 


Pro 


Ala 


Lys 


Ser 










125 










130 










135 


Gly Gly 


Pro 


Gly 


Pro Asp 


Tyr 


Tyr 


Leu 


Lys 


Asn 


Tyr 


Glu 


Asp 












140 










145 










150 


He 


Val 


Arg 


Ser 


Asp 


Val 


Ala 


Leu 


Asp Lys 


Gin 


Lys 


Gly Cys 










155 










160 










1 65 


He 


Ala 


Gin 


His 


Pro 
170 


Asp 


Val 


Met 


Leu 


Glu 
175 


Leu 


Gin 


Arg 


Glu 


180 


Ala 


Ala 


Gin 


Met 


His 
185 


Leu 


Val 


Leu 


Leu 


Lys 
190 


Glu 


Gin 


Phe 


Ser 


Asn 
195 


Thr 


Tyr 


Ser 


Asn 


Leu 


He 


Leu 


Thr 


Ala 


Arg 


Ser 


Leu 


Asp 


Lys 


Leu 








200 










205 










210 


Tyr Asn 


Phe 


Ala 


Asp Cys 


Ser 


Gly 


Leu 


His 


Leu 


He 


Phe 


Ala 


Leu 










215 










220 










225 


Asn 


Ala 


Leu 


Arg Arg Asn 


Pro 


Asn 


Asn 


Ser 


Trp 


Asn 


Ser 


Ser 


Ser 










230 










235 










240 


Ala 


Leu 


Ser 


Leu 


Leu 


Lys 


Tyr 


Ser 


Ala 


Ser 


Lys 


Lys 


Tyr 


Asn 


He 










245 






250 










255 


Ser 


Trp 


Glu 


Leu 


Gly 


Asn 


Glu 


Pro 


Asn 


Asn 


Tyr 


Arg 


Thr 


Met 


His 








260 










265 










270 


Gly Arg 


Ala 


Val 


Asn Gly Ser 


Gin 


Leu Gly 


Lys 


Asp Tyr 


He 


Gin 










275 










280 










285 


Leu 


Lys 


Ser 


Leu 


Leu 


Gin 


Pro 


He 


Arg 


He 


Tyr 


Ser Arg Ala 


Ser 








290 










295 










300 


Leu 


Tyr 


Gly 


Pro 


Asn 


He 


Gly Arg 


Pro 


Arg 


Lys 


Asn 


Val 


He 


Ala 






305 










310 










315 


Leu 


Leu 


Asp 


Gly 


Phe 
320 


Met 


Lys 


val 


Ala 


Gly 
325 


Ser 


Thr 


Val 


Asp 


Ala 
330 


Val 


Thr 


Trp 


Gin 


His 


Cys 


Tyr 


He 


Asp Gly 


Arg 


Val 


Val 


Lys 


Val 










335 










340 










345 


Met 


Asp 


Phe 


Leu 


Lys 


Thr 


Arg 


Leu 


Leu Asp 


Thr 


Leu 


Ser Asp 


Gin 








350 










355 










360 


He 


Arg 


Lys 


He 


Gin 


Lys 


val 


Val 


Asn 


Thr 


Tyr 


Thr 


Pro 


Gly 


Lys 






365 








370 










375 


Lys 


He 


Trp 


Leu 


Glu 


Gly val 


val 


Thr 


Thr 


Ser 


Ala 


Gly 


Gly 


Thr 






380 










385 










390 


Asn 


Asn 


Leu 


Ser 


Asp 


Ser 


Tyr 


Ala 


Ala 


Gly 


Phe 


Leu Trp 


Leu 


Asn 










395 








400 










405 


Thr 


Leu 


Gly Met 


Leu 


Ala 


Asn 


Gin 


Gly 


He 


Asp 


val 


Val 


He 


Arg 










410 










415 










4 20 


His 


Ser 


Phe 


Phe 


Asp 


His 


Gly Tyr 


Asn 


His 


Leu 


Val 


Asp 


Gin 


Asn 










425 










430 










435 


Phe 


Asn 


Pro 


Leu 


Pro 


Asp 


Tyr 


Trp 


Leu 


Ser 


Leu 


Leu 


Tyr 


Lys 


Arg 










440 








445 










4 50 


Leu 


He 


Gly 


Pro 


Lys 


Val 


Leu 


Ala 


Val 


His 


Val 


Ala 


Gly 


Leu 


Gin 








455 










4 60 










4 65 


Arg 


Lys 


Pro 


Arg 


Pro 


Gly Arg 


Val 


He 


Arg 


Asp 


Lys 


Leu 


Arg 


He 








470 










475 










480 


Tyr 


Ala 


His 


Cys 


Thr 


Asn 


His 


His 


Asn 


His 


Asn 


Tyr 


val 


Arg 


Gly 






485 










490 










495 


Ser 


He 


Thr 


Leu 


Phe 
500 


He 


He 


Asn 


Leu 


His 
505 


Arg 


Ser 


Arg 


Lys 


Lys 
510 


He 


Lys 


Leu 


Ala 


Gly 


Thr 


Leu 


Arg 


Asp 


Lys 


Leu 


val 


His 


Gin 


Tyr 








515 










520 










525 


Leu 


Leu 


Gin 


Pro 


Tyr 


Gly Gin Glu 


Gly 


Leu 


Lys 


Ser 


Lys 


Ser 


Val 










530 










535 










540 


Gin 


Leu 


Asn 


Gly Gin 


Pro 


Leu 


Val 


Met 


val 


Asp Asp Gly Thr 


Leu 










545 










550 










555 


Pro 


Glu 


Leu 


Lys 


Pro 
560 


Arg 


Pro 


Leu 


Arg 


Ala 
565 


Gly 


Arg 


Thr 


Leu 


Val 
570 


He 


Pro 


Pro 


Val 


Thr 


Met 


Gly 


Phe 


Phe 


Val 


Val 


Lys 


Asn 


Val 


Asn 










575 








580 










585 


Ala 


Leu 


Ala 


Cys 


Arg 
590 


Tyr 


Arg 



















(2) INFORMATION FOR SEQ ID NO: 4: 

(i) SEQUENCE CHARACTERISTICS: 



(A) LENGTH: 1898 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 



<xi) SEQUENCE DESCRIPTION: SEQ ID NO: 4: 

CGCTTAATTC TAGAAGAGGG ATTGAATGAG GGTGCTTTGT GCCTTCCCTG 50 

AAGCCATGCC CTCCAGCAAC TCCCGCCCCC CCGCGTGCCT AGCCCCGGGG 100 

GCTCTCTACT TGGCTCTGTT GCTCCATCTC TCCCTTTCCT CCCAGGCTGG 150 
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AGACAGGAGA CCCTTGCCTG TAGACAGAGC 
CCCTGATTCT ACTTGATGTG AGCACCAAGA 
GAGAACTTCC TCTCTCTGCA GCTGGATCCG 
GCTCGATTTC CTAAGCTCCA AGCGCTTGGT 
CGCCCGCCTT TCTGCGCTTC GGGGGC AAAA 
CAGAACCTGA GGAACCCGGC GAAAAGCCGC 
CTATCTCAAA AACTATGAGG ATGCCAGGTC 
TTGCTGATTG CTCTGGACTC CACCTGATAT 
CGTAATCCCA ATAACTCCTG GAACAGTTCT 
GTACAGCGCC AGCAAAAAGT ACAACATTTC 
CAAATAACTA TCGGACCATG CATGGCCGGG 
GGAAAGGATT ACATCCAGCT GAAGAGCCTG 
TTCCAGAGCC AGCTTATATG GCCCTAATAT 
TCATCGCCCT CCTAGATGGA TTCATGAAGG 
GCAGTTACCT GGCAACATTG CTACATTGAT 
GGACTTCCTG AAAACTCGCC TGTTAGACAC 
AAATTCAGAA AGTGGTTAAT ACATACACTC 
GAAGGTGTGG TGACCACCTC AGCTGGAGGC 
CTATGCTGCA GGATTCTTAT GGTTGAACAC 
AGGGCATTGA TGTCGTGATA CGGCACTCAT 
CACCTCGTGG ACCAGAATTT TAACCCATTA 
CCTCTACAAG CGCCTGATCG GCCCCAAAGT 
GGCTCCAGCG GAAGCCACGG CCTGGCCGAG 
ATTTATGCTC ACTGCACAAA CCACCACAAC 
CATTACACTT TTTATCATCA ACTTGCATCG 
TGGCTGGGAC TCTCAGAGAC AAGCTGGTTC 
TATGGGCAGG AGGGCCTAAA GTCCAAGTCA 
CTTAGTGATG GTGGACGACG GGACCCTCCC 
TTCGGGCCGG CCGGACATTG GTCATCCCTC 
GTGGTCAAGA ATGTCAATGC TTTGGCCTGC 
CACACTCATG GCTACCAGTG GGCCTGCTGG 
CCAGTAGTAT CCTCTGTTTT CAGACATCCT 
CCATCCTGCT GGAATCAACA CAGACTTGCT 
ATAGCGT G AT CTTAGCCTAG GTAGGCCACA 
AG AC AT C AC C TGTACCTATA TAAGGATAAA 



5 



TGCAGGTTTG 


AAGGAAAAGA 


200 


ACCCAGTCAG 


GACACJTCAAT 


250 


TCCATCATTC 


ATGATGGCTG 


300 


CZACCCTCZCZCC 




350 


GG AC CGAC T T 


CCTGCAGTTC 


400 






450 


TCTAGACAAA 


v_- x x x rt 1 nnU 1 


500 


X 1 O 1 v~> 1 r\-rVr\ 




550 






Ann 




uu 1 AA 1 uAliL 




l-Hb I AAA 1 GG 


LAULLAbi 1G 


~if\r\ 
/uu 


1 1 AG^t_L A 


1GGGGATT. I A 


750 


1 GG<jCGGGL-G 


AGGAAGAATG 


800 


1 GGCAGG AAG 


TACAGTAGAT 


850 


GGCCGGGTGG 


TCAAGGTGAT 


900 


ACTC TCTG AC 


CAGATTAGGA 


950 


CAGGAAAGAA 


GATTTGGCTT 


1000 


ALMALAA 1 G 


1 A I GCGAT TL. 


1050 


TTTAGGAATG 


CTGGCCAATC 


1100 


TTTTTGACCA 


TGGATACAAT 


1150 


CCAGACTACT 


GGCTCTCTCT 


1200 


CTTGGCTGTG 


CATGTGGCTG 


1250 


TGATCCGGGA 


CAAACTAAGG 


1300 


CACAACTACG 


TTCGTGGGTC 


1350 


ATCAAGAAAG 


AAAATCAAGC 


1400 


ACCAGTACCT 


GCTGCAGCCC 


1450 


GTGCAACTGA 


ATGGCCAGCC 


1500 


AGAATTGAAG 


CCCCGCCCCC 


1550 


CAGTCACCAT 


GGGCTTTTTT 


1600 


CGCTACCGAT 


AAGCTATCCT 


1650 


GCTGCTTCCA 


CTCCTCCACT 


1700 


AGCAACCAGC 


CCCTGCTGCC 


1750 


CTCCAAAGAG 


ACTAAATGTC 


1800 


TCCATCCCAA 


AGGAAAATGT 


1850 


GGCATGTGTA 


TAGAGCAA 


1898 



2) I N FORMAT I ON FOR SEQ ID NO: 5: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 538 

<B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY : linear 

(xi> SEQUENCE DESCRIPTION: SEQ ID NO: 5: 



Met 


Arg Val 


Leu 


Cys 


Ala 


Phe 


Pro 


Glu 


Ala 


Met 


Pro 


Ser 


Ser 


Asn 








5 










10 










15 


Ser 


Arg Pro 


Pro 


Ala 


Cys 


Leu 


Ala 


Pro Gly 


Ala 


Leu 


Tyr 


Leu 


Ala 








20 










25 










30 


Leu 


Leu Leu 


His 


Leu 


Ser 


Leu 


Ser 


Ser 


Gin 


Ala 


Gly Asp 


Arg Arg 








35 










40 










45 


Pro 


Leu Pro 


Val 


Asp 


Arg 


Ala 


Ala 


Gly Leu 


Lys 


Glu 


Lys 


Thr 


Leu 








50 










55 








60 


He 


Leu Leu 


Asp 


Val 


Ser 


Thr 


Lys 


Asn 


Pro 


Val 


Arg 


Thr 


Val 


Asn 








65 










70 










75 


Glu 


Asn Phe 


Leu 


Ser 


Leu 


Gin 


Leu 


Asp 


Pro 


Ser 


He 


He 


His 


Asp 








80 










85 










90 


Gly 


Trp Leu 


Asp 


Phe 


Leu 


Ser 


Ser 


Lys 


Arg 


Leu 


Val 


Thr 


Leu 


Ala 








95 










100 










105 


Arg 


Gly Leu 


Ser 


Pro 


Ala 


Phe 


Leu 


Arg 


Phe 


Gly 


Gly 


Lys 


Arg 


Thr 








110 










115 










120 


Asp 


Phe Leu 


Gin 


Phe 


Gin 


Asn 


Leu 


Arg 


Asn 


Pro 


Ala 


Lys 


Ser Arg 








125 










130 








135 


Gly 


Gly Pro 


Gly 


Pro 


Asp 


Tyr 


Tyr 


Leu 


Lys 


Asn 


Tyr 


Glu 


Asp 


Ala 








140 










145 










150 


Arg 


Ser Leu 


Asp 


Lys 


Leu 


Tyr 


Asn 


Phe 


Ala 


Asp 


Cys 


Ser 


Gly 


Leu 








155 










160 










165 


His 


Leu lie 


Phe 


Ala 


Leu 


Asn 


Ala 


Leu 


Arg 


Arg 


Asn 


Pro 


Asn 


Asn 








170 










175 










180 


Ser 


Trp Asn 


Ser 


Ser 


Ser 


Ala 


Leu 


Ser 


Leu 


Leu 


Lys 


Tyr 


Ser 


Ala 








185 










190 










195 


Ser 


Lys Lys 


Tyr 


Asn 


He 


Ser 


Trp 


Glu 


Leu 


Gly 


Asn 


Glu 


Pro 


Asn 








200 










205 










210 


Asn 


Tyr Arg 


Thr 


Met 


His 


Gly 


Arg 


Ala 


Val 


Asn 


Gly 


Ser 


Gin 


Leu 








215 










220 








225 


Gly 


Lys Asp 


Tyr 


He 


Gin 


Leu 


Lys 


Ser 


Leu 


Leu 


Gin 


Pro 


He 


Arg 








230 










235 










240 


He 


Tyr Ser 


Arg 


Ala 


Ser 


Leu 


Tyr 


Gly 


Pro 


Asn 


He 


Gly 


Arg 


Pro 








245 










250 






255 


Arg 


Lys Asn 


Val 


He 


Ala 


Leu 


Leu 


Asp Gly 


Phe 


Met 


Lys 


val 


Ala 
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260 










265 








270 


Gly 


Ser Thr 


Val 


Asp 
275 


Ala 


Val 


Thr 


Trp 


Gin 
280 


His 


Cys Tyr 


He 


Asp 
285 


Gly 


Arg Val 


Val 


Lys 
290 


Val 


Met 


Asp 


Phe 


Leu 
295 


Lys 


Thr Arg 


Leu 


Leu 
300 


Asp 


Thr Leu 


Ser 


Ala 
305 


Gin 


He 


Arg 


Lys 


He 
310 


Gin 


Lys Val 


Val 


Asn 
315 


Thr 


Tyr Thr 


Pro 


Glv 
320 


Lvs 


Lvs 


lie 




Leu 
325 


Glu 


Gly Val 


Val 


Thr 
330 


Thr 


Ser Ala 


Gl v 


Glv 
335 


Thr 


Asn 


Asn 




Ser 
340 


Asp 


Ser Tyr 


Ala 


Ala 
345 


Gly 


Phe Leu 


Trp 


350 




Thr 




Gly 


Me t 
355 




Ala Asn 


Gin 


Gly 
360 


I le 


Asp Val 


Val 


lie 


Arg 


His 


Ser 






Asp 


His Gly Tyr 


Asn 








365 










370 








375 


His 


Leu Val 


Asp 




Asn 


fne 


Asn 


Pro 


Leu 


Pro 


Asp Tyr 


Trp 


Leu 








380 










385 




390 


Ser 


Leu Leu 


Tyr 


Lys 
395 


Arg 


Leu 


Tin 

i ±e 


Gly 


Pro 
400 


Lys 


Val Leu 


Ala 


Val 
405 




Val Ala 




Leu 
410 




Arg 


Lys 


Pro 


Arg 
415 


Pro 


Gly Arg 


val 


He 
420 


Arg 


Asp Lys 


Leu 


Arg 
425 


He 


Tyr 


Ala 


His 


Cys 
430 


Thr 


Asn His 


His 


Asn 
435 


His 


Asn Tyr 


val 


Arg 
440 


Gly 


Ser 


He 


Thr 


Leu 
445 


Phe 


He He 


Asn 


Leu 
450 


His 


Arg Ser 


Arg 


Lys 
455 


Lys 


He 


Lys 


Leu 


Ala 
4 60 


Gly 


Thr Leu 


Arg 


Asp 
465 


Lys 


Leu Val 


His 


Gin 
470 


Tyr 


Leu 


Leu 


Gin 


Pro 
475 


Tyr 


Gly Gin 


Glu 


Gly 
480 


Leu 


Lys Ser 


Lys 


Ser 
485 


Val 


Gin 


Leu 


Asn 


Gly 
490 


Gin 


Pro Leu 


Val 


Met 
495 


Val 


Asp Asp 


Gly 


Thr 
500 


Leu 


Pro 


Glu 


Leu 


Lys 
505 


Pro 


Arg Pro 


Leu 


Arg 
510 


Ala 


Gly Arg 


Thr 


Leu 


Val 


He 


Pro 


Pro 


Val 


Thr 


Met Gly 


Phe 


Phe 








515 










520 






525 


Val 


Val Lys 


Asn 


Val 
530 


Asn 


Ala 


Leu 


Ala 


Cys 
535 


Arg 


Tyr Arg 







2) 



INFORMATION FOR SEQ ID NO: 6: 



(i) 



SEQUENCE CHARACTERISTICS: 



(A) 
(B) 
(C) 
(D) 



LENGTH : 
TYPE: 

STRANDEDNESS : 
TOPOLOGY : 



1724 

nucleic acid 
single 
linear 
SEQ ID N0:6: 



(xi) SEQUENCE DESCRIPTION: 

CGCTTAATTC TAGAAGAGGG ATTGAATGAG GGTGCTTTGT GCCTTCCCTG 50 

AAGCCATGCC CTCCAGCAAC TCCCGCCCCC CCGCGTGCCT AGCCCCGGGG 100 

GCTCTCTACT TGGCTCTGTT GCTCCATCTC TCCCTTTCCT CCCAGGCTGG 150 

AGACAGGAGA CCCTTGCCTG TAGACAGAGC TGCAGGTTTG AAGGAAAAGA 200 

CCCTGATTCT AC TTG ATGTG AGCACCAAGA ACCCAGTCAG GACAGTCAAT 250 

GAGAACTTCC TCTCTCTGCA GCTGGATCCG TCCATCATTC ATGATGGCTG 300 

GCTCGATTTC CTAAGCTCCA AGCGCTTGGT GACCCTGGCC CGGGGACTTT 350 

CGCCCGCCTT TCTGCGCTTC GGGGGCAAAA GGACCGACTT CCTGCAGTTC 4 00 

CAGAACCTGA GGAACCCGGC GAAAAGCCGC GGGGGCCCGG GCCCGGATTA 4 50 

CTATCTCAAA AACTATGAGG ATGAGCCAAA TAACTATCGG ACCATGCATG 500 

GCCGGGCAGT AAATGGCAGC CAGTTGGGAA AGGATTACAT CCAGCTGAAG 550 

AGCCTGTTGC AGCCCATCCG GATTTATTCC AGAGCCAGCT TATATGGCCC 600 

TAATATTGGG CGGCCGAGGA AGAATGTCAT CGCCCTCCTA GATGGATTCA 650 

TGAAGGTGGC AGGAAGTACA GTAGATGCAG TTACCTGGCA ACATTGCTAC 700 

ATTGATGGCC GGGTGGTCAA GGTGATGGAC TTCCTGAAAA CTCGCCTGTT 750 

AGACACACTC TCTGACCAGA TTAGGAAAAT TCAGAAAGTG GTTAATACAT 800 

ACACTCCAGG AAAGAAGATT TGGCTTGAAG GTGTGGTGAC CACCTCAGCT 850 

GGAGGCACAA ACAATCTATC CGATTCCTAT GCTGCAGGAT TCTTATGGTT 900 

G AACACTT T A GGAATGCTGG CCAATCAGGG CATTGATGTC GTGATACGGC 950 

ACTCATTTTT TGACCATGGA TACAATCACC TCGTGGACCA GAATTTTAAC 1000 

CCATTACCAG ACTACTGGCT CTCTCTCCTC TACAAGCGCC TGATCGGCCC 1050 

CAAAGTCTTG GCTGTGCATG TGGCTGGGCT CCAGCGGAAG CCACGGCCTG 1100 

GCCGAGTGAT CCGGGACAAA CTAAGGATTT ATGCTCACTG CACAAACCAC 1150 

CACAACCACA ACT AC GT TCG TGGGTCCATT ACACTTTTTA TCATCAACTT 1200 

GCATCGATCA AGAAAGAAAA TCAAGCTGGC TGGGACTCTC AGAGACAAGC 1250 

TGGTTCACCA GTACCTGCTG CAGCCCTATG GGCAGGAGGG CCTAAAGTCC 1300 

AAGTCAGTGC AACTGAATGG CCAGCCCTTA GTGATGGTGG ACGACGGGAC 1350 

CCTCCCAGAA TTGAAGCCCC GCCCCCTTCG GGCCGGCCGG ACATTGGTCA 14 00 

TCCCTCCAGT CACCATGGGC TTTTTTGTGG TCAAGAATGT CAATGCTTTG 14 50 

GCCTGCCGCT ACCGATAAGC TATCCTCACA CTCATGGCTA CCAGTGGGCC 1500 

TGCTGGGCTG CTTCCACTCC TCCACTCCAG TAGTATCCTC TGTTTTCAGA 1550 

CATCCTAGCA ACCAGCCCCT GCTGCCCCAT CCTGCTGGAA TCAACACAGA 1600 

CTTGCTCTCC AAAGAGACTA AATGTCATAG CGTGATCTTA GCCTAGGTAG 1650 

GCCACATCCA TCCCAAAGGA AAATGTAGAC ATCACCTGTA CCTATATAAG 1700 

GATAAAGGCA TGTGTATAGA GCAA 1724 
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7 

(2) INFORMATION FOR SEQ ID NO: 7: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 480 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 







(xi) 


SEQUENCE 


DESCRIPTION 


: SEQ ID NO: 


7 : 






Met 


Arg 


Val 






Ala 


Phe 


Pro 


Glu 


Ala 


Met 


Pro 


Ser 


Ser 


Asn 










5 










10 










15 


Ser 


Arg 


Pro 


Pro 


Ala 


cys 


Leu 


Ala 


Pro 


Gly Ala 


Leu 


Tyr 


Leu 


Ala 










20 










25 








30 


Leu 


Leu 


Leu 


is 


Leu 


Ser 


Leu 


Ser 


Ser 


Gin 


Ala 


Gly Asp Arg Arg 










"5 c; 
J D 










40 










45 


Pro 


Leu 


Pro 


Val 


Asp 


Arg 


Ala 


Ala 


Gly 


Leu 


Lys 


Glu 


Lys 


Thr 


Leu 










50 










55 










60 


lie 


Leu 


Leu 


Asp 


Val 


Ser 


Thr 


Lys 


Asn 


Pro 


Val 


Arg 


Thr 


Val 


Asn 










65 










70 










75 


Glu 


Asn 


Phe 


Leu 


Ser 


Leu 


Gin 


Leu 


Asp 


Pro 


Ser 


He 


He 


His 


Asp 










80 










85 










90 


Gl v 


Trp 


Leu 


Asp 


Phe 


Leu 


Ser 


Ser 


Lys 


Arg 


Leu 


Val 


Thr 


Leu 


Ala 










95 










100 










105 


/\rg 


Gly 


Leu 


Ser 


Pro 


Ala 


Phe 


Leu 


Arg 


Phe 


Gly 


Gly 


Lys 


Arg 


Thr 










110 










115 










120 


sp 


Phe 


Leu 


Gin 


Phe 


Gin 


Asn 


Leu 


Arg 


Asn 


Pro 


Ala 


Lys 


Ser 


Arg 










125 










130 










135 


y 


Gly 


Pro 


Gly 


Pro 


Asp 


Tyr 


Tyr 


Leu 


Lys 


Asn 


Tyr 


Glu 


Asp Glu 










140 










145 










150 


Pro 


Asn 


Asn 


Tyr 


Arg 


Thr 


Met 


His 


Gly Arg 


Ala 


Val 


Asn 


Gly 


Ser 










155 










160 








165 


Gin 


Leu 


Gly 


Lys 


Asp 


Tyr 


He 


Gin 


Leu 


Lys 


Ser 


Leu 


Leu 


Gin 


Pro 










170 










175 










180 


lie 


Arg 


He 


Tyr 


Ser 


Arg 


Ala 


Ser 


Leu 


Tyr 


Gly 


Pro 


Asn 


He 


Gly 










185 










190 










195 


r 9 


Pro 


Arg 


Lys 


Asn 


Val 


He 


Ala 


Leu 


Leu 


Asp 


Gly 


Phe 


Met 


Lys 










200 










205 










210 


Val 


Ala 


Gly 


Ser 


Thr 


Val 


Asp 


Ala 


Val 


Thr 


Trp 


Gin 


His 


Cys 


Tyr 










215 










220 










225 


lie 


Asp Gly 


Arg 


Val 


Val 


Lys 


Val 


Met 


Asp 


Phe 


Leu 


Lys 


Thr 


Arg 










230 










235 










240 


Leu 


Leu 


Asp 


Thr 


Leu 


Ser 


Asp 


Gin 


He 


Arg 


Lys 


He 


Gin 


Lys 


Val 










^i} ~* 










250 










255 


Val 


Asn 


Thr 


Tyr 


Thr 


Pro 


Gly 


Lys 


Lys 


He 


Trp 


Leu 


Glu 


Gly 


Val 










£. DU 










265 










270 


Val 


Thr 


Thr 


Ser 


Ala 


Gly 


Gly 


Thr 


Asn 


Asn 


Leu 


Ser 


Asp 


Ser 


Tyr 










275 










280 










285 


Ala 


Ala 


Gly 


Phe 


Leu 


Trp 


Leu 


Asn 


Thr 


Leu 


Gly Met 


Leu 


Ala 


Asn 










290 










295 










300 


Gin 


Gly 


He 


Asp 


Val 


Val 


He 


Arg 


His 


Ser 


Phe 


Phe 


Asp 


His 


Gly 










305 










310 










315 


Tyr 


Asn 


His 


Leu 


val 


ASp 


Gin 


Asn 


Phe 


Asn 


Pro 


Leu 


Pro 


Asp 


Tyr 










320 










325 










330 


Trp 


Leu 


Ser 


Leu 


Leu 


Tyr 


Lys 


Arg 


Leu 


He 


Gly 


Pro 


Lys 


Val 


Leu 










335 










340 










345 


Ala 


Val 


His 


val 


Ala 


Gly 


Leu 


Gin 


Arg 


Lys 


Pro Arg 


Pro 


Gly 


Arg 










350 










355 










360 


Val 


lie 


Arg 


Asp 


Lys 


Leu 


Arg 


He 


Tyr 


Ala 


His 


Cys 


Thr 


Asn 


His 










365 










370 










375 


His 


Asn 


His 


Asn 


Tyr 


Val 


Arg 


Gly 


Ser 


lie 


Thr 


Leu 


Phe 


He 


He 










380 










385 










390 


Asn 


Leu 


His 


Arg 


Ser 


Arg 


Lys 


Lys 


He 


Lys 


Leu 


Ala 


Gly 


Thr 


Leu 










395 










400 










405 


Arg 


Asp 


Lys 


Leu 


Val 


His 


Gin 


Tyr 


Leu 


Leu 


Gin 


Pro 


Tyr 


Gly Gin 










410 










415 










420 


Glu 


Gly 


Leu 


Lys 


Ser 


Lys 


Ser 


val 


Gin 


Leu 


Asn 


Gly Gin 


Pro 


Leu 










425 










430 










435 


Val 


Met 


Val 


Asp 


Asp 


Gly 


Thr 


Leu 


Pro 


Glu 


Leu 


Lys 


Pro 


Arg 


Pro 










440 










445 










450 


Leu 


Arg Ala 


Gly 


Arg 


Thr 


Leu 


val 


lie 


Pro 


Pro 


Val 


Thr 


Met 


Gly 










455 










460 










465 


Phe 


Phe 


Val 


Val 


Lys 


Asn 


val 


Asn 


Ala 


Leu 


Ala 


Cys 


Arg 


Tyr 


Arg 



470 475 480 

(2) INFORMATION FOR SEQ ID NO: 8: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 351 

(B) TYPE: amino acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 8: 
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GTTCGGCAGA GGATCATGTC TGATGTACAG AGACATTGTC CGGAGTGATG 50 

TTGCCTTGGA CAAGCAGAAA GGCTGTAAGA TTGGCCAGCA CCCTGATGTC 100 

ATGCTGGAGC TCCAGAGAGA GAAGGCATCC AGACTGTCTG GTTCTTCTGA 150 

AGGAGCAATA CTCCAATACT TACAGTAACC TCATATTAAC AGGTCTCTAG 200 

ACAAACTTTA TAACTTTGCT GATTGCTCTG GACTCCACCT GATATTTGCT 250 

CTAAATGCAC TGCGTCGTAA TCCCAATAAC TCCTGGAACA GTTCTAGTGC 300 

CCTGAGCCTG TTGAAGTACA GTGCCAGCAA AAAGTACAAC ATTTCTTGGG 350 

A 351 

<2) INFORMATION FOR SEQ ID NO: 9: 

<i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 543 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 9: 



Met 


Leu 


Leu 


Arg 


Ser 


Lys 


Pro 


Ala 


Leu 


Pro 


Pro 


Pro 


Leu 


Met 


Leu 


Leu 








5 








10 










15 




Leu 


Leu 


Gly 


Pro 


Leu 


Gly 


Pro 


Leu 


Ser 


Pro 


Gly 


Ala 


Leu 


Pro 


Arg 


Pro 






20 










25 










30 






Ala 


Gin 


Ala 
35 


Gin 


Asp 


Val 


Val 


Asp 
40 


Leu 


ASp 


Phe 


Phe 


Thr 
45 


Gin 


Glu 


Pro 


Leu 


His 
50 


Leu 


Val 


Ser 


Pro 


Ser 
55 


Phe 


Leu 


Ser 


val 


Thr 
60 


He 


Asp 


Ala 


Asn 


Leu 


Ala 


Thr 


Asp 


Pro 


Arg 


Phe 


Leu 


He 


Leu 


Leu 


Gly 


Ser 


Pro 


Lys 


Leu 


65 










70 










75 










80 


Arg 


Thr 


Leu 


Ala 


Arg 


Gly 


Leu 


Ser 


Pro 


Ala 


Tyr 


Leu 


Arg 


Phe 


Gly Gly 








85 










90 










95 




Thr 


Lys 


Thr 


Asp 


Phe 


Leu 


He 


Phe 


Asp 


Pro 


Lys 


Lys 


Glu 


Ser 


Thr 


Phe 






100 










105 










110 






Glu 


Glu 


Arg 


Ser 


Tyr 


Trp 


Gin 


Ser 


Gin 


val 


Asn 


Gin 


Asp 


He 


Cys 


Lys 






115 






120 










125 








Tyr 


Gly 


Ser 


He 


Pro 


Pro 


Asp 


Val 


Glu 


Glu 


Lys 


Leu 


Arg 


Leu 


Glu 


Trp 


130 










135 










140 










Pro 


Tyr 


Gin 


Glu 


Gin 


Leu 


Leu 


Leu 


Arg 


Glu 


His 


Tyr 


Gin 


Lys 


Lys 


Phe 


145 








150 










155 










160 


Lys 


Asn 


Ser 


Thr 


Tyr 


Ser 


Arg 


Ser 


Ser 


Val 


Asp 


Val 


Leu 


Tyr 


Thr 


Phe 








165 










170 










175 




Ala 


Asn 


Cys 


Ser 


Gly 


Leu 


Asp 


Leu 


He 


Phe 


Gly 


Leu 


Asn 


Ala 


Leu 


Leu 






180 








185 










190 






Arg 


Thr 


Ala 


Asp 


Leu 


Gin 


Trp 


Asn 


Ser 


Ser 


Asn 


Ala 


Gin 


Leu 


Leu 


Leu 




195 








200 










205 








Asp 


Tyr 

210 


Cys 


Ser 


Ser 


Lys 


Gly 
215 


Tyr 


Asn 


He 


Ser 


Trp 
220 


Glu 


Leu 


Gly 


Asn 


Glu 


Pro 


Asn 


Ser 


Phe 


Leu 


Lys 


Lys 


Ala 


Asp 


He 


Phe 


He 


Asn 


Gly 


Ser 


225 










230 










235 










240 


Gin 


Leu Gly Glu 


Asp 


Tyr 


He 


Gin 


Leu 


His 


Lys 


Leu 


Leu 


Arg 


Lys 


Ser 










245 










250 










255 




Thr 


Phe 


Lys 


Asn 


Ala 


Lys 


Leu 


Tyr 


Gly 


Pro 


Asp 


val 


Gly 


Gin 


Pro Arg 






260 








265 










270 






Arg 


Lys 


Thr 


Ala 


Lys 


Met 


Leu 


Lys 


Ser 


Phe 


Leu 


Lys 


Ala 


Gly 


Gly 


Glu 


275 










280 










285 








Val 


He 


Asp 


Ser 


Val 


Thr 


Trp 


His 


His 


Tyr 


Tyr 


Leu 


Asn 


Gly 


Arg 


Thr 




290 








295 










300 










Ala 


Thr 


Arg 


Glu 


Asp 


Phe 


Leu 


Asn 


Pro 


Asp 


val 


Leu 


Asp 


He 


Phe 


He 


305 






310 










315 










320 


Ser 


Ser 


Val 


Gin 


Lys 


val 


Phe 


Gin 


val 


val 


Glu 


Ser 


Thr 


Arg 


Pro Gly 










325 










330 










335 




Lys 


Lys 


Val 


Trp 


Leu 


Gly 


Glu 


Thr 


Ser 


Ser 


Ala 


Tyr 


Gly 


Gly 


Gly 


Ala 




340 










345 










350 






Pro 


Leu 


Leu 


Ser 


Asp 


Thr 


Phe 


Ala 


Ala 


Gly 


Phe 


Met 


Trp 


Leu 


Asp 


Lys 






355 








360 










365 








Leu 


Gly 
370 


Leu 


Ser 


Ala 


Arg 


Met 
375 


Gly 


He 


Glu 


Val 


val 
380 


Met 


Arg 


Gin 


Val 


Phe 


Phe 


Gly Ala 


Gly 


Asn 


Tyr 


His 


Leu 


val 


ASp 


Glu 


Asn 


Phe 


Asp 


Pro 


385 










390 










395 










400 


Leu 


Pro Asp Tyr 


Trp 


Leu 


Ser 


Leu 


Leu 


Phe 


Lys 


Lys 


Leu 


.Val 


Gly Thr 










405 










410 










415 




Lys 


Val 


Leu 


Met 


Ala 


Ser 


Val 


Gin 


Gly 


Ser 


Lys 


Arg 


Arg 


Lys 


Leu 


Arg 






420 








425 








430 




Val 


Tyr 


Leu 


His 


Cys 


Thr 


Asn 


Thr 


Asp 


Asn 


Pro 


Arg 


Tyr 


Lys 


Glu Gly 






435 








440 










445 








Asp 


Leu 


Thr 


Leu 


Tyr 


Ala 


He 


Asn 


Leu 


His 


Asn 


Val 


Thr 


Lys 


Tyr 


Leu 


450 








455 










4 60 










Arg 


Leu 


Pro Tyr 


Pro 


Phe 


Ser 


Asn 


Lys 


Gin 


Val 


Asp 


Lys 


Tyr 


Leu 


Leu 


465 










470 










475 










480 


Arg 


Pro 


Leu 


Gly 


Pro 


His 


Gly 


Leu 


Leu 


Ser 


Lys 


Ser 


Val 


Gin 


Leu 


Asn 






485 










490 










495 




Gly 


Leu 


Thr 


Leu 


Lys 


Met 


Val 


Asp 


Asp 


Gin 


Thr 


Leu 


Pro 


Pro 


Leu 


Met 






500 










505 










510 
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Glu Lys Pro Leu Arg Pro Gly Ser Ser Leu Gly Leu Pro Ala Phe Se 

515 520 525 

Tyr Ser Phe Phe Val lie Arg Asn Ala Lys Val Ala Ala Cys lie 
530 535 540 



(2) INFORMATION FOR SEQ ID NO: 10: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 
<D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10: 
GGAGAGCAAG TCTGTGTTGA TTC 23 

(2) INFORMATION FOR SEQ ID NO: 11: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 

(B) TYPE: nucleic acid 
(C> STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11: 
CACTGGTAGC CATGAGTGTG AG 22 

(2) INFORMATION FOR SEQ ID NO: 12: 

(i) SEQUENCE CHARACTERISTICS; 

(A) LENGTH: 22 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12: 
TTGGTCATCC CTCCAGTCAC CA 22 

(2) INFORMATION FOR SEQ ID NO: 13: 

(i) SEQUENCE CHARACTERISTICS: 

{A) LENGTH: 2 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 13 

Asp Glu 



(2) INFORMATION FOR SEQ ID NO: 14: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 14: 
CTTGCCTGTA GACAGAGCTG CAG 23 



(2) INFORMATION FOR SEQ ID NO: 15: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2396 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 15 
TTTCTAGTTG CTTTTAGCCA ATGTCGGATC AGGTTTTTCA AGCGACAAAG 50 

AGATACTGAG ATCCTGGGCA GAGGACATCC TAGCTCGGTC AGATTTGGGC 100 

AGGCTCAAGT GACCAGTGTC TTAAGGCAGA AGGGAGTCGG GGTAGGGTCT 150 

GGCTGAACCC TCAACCGGGG CTTTTAACTC AGGGTCTAGT CCTGGCGCCA 200 

AATGGATGGG ACCTAGAAAA GGT G AC AG AG TGCGCAGGAC ACCAGGAAGC 250 

TGGTCCCACC CCTGCGCGGC TCCCGGGCGC TCCCTCCCCA GGCCTCCGAG 300 

GATCTT GGAT TCTGGCCACC TCCGCACCCT TTGGATGGGT GTGGATGATT 350 

TCAAAAGTGG ACGTGACCGC GGCGGAGGGG AAAGCCAGCA CGGAAATGAA 4 00 

AGAGAGCGAG GAGGGGAGGG CGGGGAGGGG AGGGCGCTAG GGAGGGACTC 4 50 

CCGGGAGGGG TGGGAGGGAT GGAGCGCTGT GGGAGGGTAC TGAGTCCTGG 500 

CGCCAGAGGC GAAGCAGGAC CGGTTGCAGG GGGCTTGAGC CAGCGCGCCG 550 

GCTGCCCCAG CTCTCCCGGC AGCGGGCGGT CCAGCCAGGT GGGATGCTGA 600 

GGCTGCTGCT GCTGTGGCTC TGGGGGCCGC TCGGTGCCCT GGCCCAGGGC 650 

GCCCCCGCGG GGACCGCGCC GACCGACGAC GTGGTAGACT TGGAGTTTTA 700 

CACCAAGCGG CCGCTCCGAA GCGTGAGTCC CTCGTTCCTG TCCATCACCA 750 

TCGACGCCAG CCTGGCCACC GACCCGCGCT TCCTCACCTT CCTGGGCTCT 800 

CCAAGGCTCC GTGCTCTGGC TAGAGGCTTA TCTCCTGCAT ACTTGAGATT 850 

TGGCGGCACA AAGACTGACT TCCTTATTTT TGATCCGGAC AAGGAACCGA 900 

CTTCCGAAGA AAGAAGTTAC TGGAAATCTC AAGTCAACCA TGATATTTGC 950 

AGGTCTGAGC CGGTCTCTGC TGCGGTGTTG AGGAAACTCC AGGTGGAATG 1000 

GCCCTTCCAG GAGCTGTTGC TGCTCCGAGA GCAGTACCAA AAGGAGTTCA 1050 
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AGAACAGCAC 
AAGTGCTCGG 
CCCAGACTTA 
GCTCTTCCAA 
AGTTTcTGGA 
AGACTTTGTG 
CAAAACTCTA 
CTGCTGAGGA 
ATGGCATCAC 
TGAGCTCTGA 
AAGGTCACTA 
GAGCTCAGCT 
CTGGCTTTAT 
GAAGTCGTGA 
GGATGAAAAC 
AGAAACTGGT 
AGGAGCAAAC 
ATATCAGGAA 
CCAAGCACTT 
TACCTTCTGA 
ACTGAACGGT 
TGACAGAAAA 
TCCTATGGTT 
AAAATAAAAG 
TTCATAAAAC 
GAGCTTCGGG 
CTCTCTAAGA 



CTACTCAAGA 
GGTTAGACCT 
CGGTGGAACA 
GGGTTATAAC 
AGAAAGCTCA 
GAGTTGCATA 
TGGTCCTGAC 
GTTTCCTGAA 
TAT TACTTG A 
TGCGCTGGAC 
AAGAGATCAC 
TACGGTGGCG 
GTGGCTGGAT 
TGAGGCAGGT 
TTTGAGCCTT 
AGGTCCCAGG 
TCCGAGTGTA 
GGAGATCTAA 
GAAGGTACCG 
AGCCTTCGGG 
CAAATTCTGA 
ACCTCTCCCC 
TTTTTGTCAT 
GCATACGGTA 
AAAACCCTAG 
AGGGTGGGGT 
AGAATACTGC 



AGCTCAGTGG 
GATCTTTGGT 
GCTCCAACGC 
ATcTCCTGGG 
CATTCTCATC 
AACTTcTACA 
ATCGGTCAGC 
GGCTGGCGGA 
ATGGACGCAT 
ACT T T T ATTC 
ACCTGGCAAG 
GTGCACCCTT 
AAATTGGGCC 
GTTCTTCGGA 
TACCTGATTA 
GTGTTACTGT 
TCTCCACTGC 
CTCTGTATGT 
CCTCCGTTGT 
GCCGGATGGA 
AGATGGTGGA 
GCAGGAAGTG 
AAGAAATGCC 
CCCCTGAGAC 
TTTAGGAGGC 
ACACTTCAGT 
AGGTGGTGAC 



ACATGCTCTA 
CTAAATGCGT 
CCAGCTTCTC 
AACTGGGCAA 
GATGGGTTGC 
AAGGTCAGCT 
CTCGAGGGAA 
GAAGTGATCG 
CGCTACCAAA 
TCTCTGTGCA 
AAGGTCTGGT 
GCTGTCCAAC 
TGTCAGCCCA 
GCAGGCAACT 
CTGGCTCTCT 
CAAGAGTGAA 
ACTAACGTCT 
CCTGAACCTC 
TCAGGAAACC 
TTACTTTCCA 
TGAGCAGACC 
CACTAAGCCT 
AAAATCGCTG 
AAAAGCCGAG 
CACCTCCTTG 
ATTACATTCA 
AGTTAATAGC 



CAGTTTTGCC 
TACTACGAAC 
CTTGACTACT 
TGAGCCCAAC 
AGTTAGGAGA 
TTCCAAAATG 
GACAGTTAAA 
ACTCTCTTAC 
GAAGATTTTC 
AAAAATTCTG 
TGGGAGAGAC 
ACCTTTGCAG 
GATGGGCATA 
ACCACTTAGT 
CTTCTGTTCA 
AGGCCCAGAC 
ATCACCCACG 
CATAATGTCA 
AGTGG AT AC G 
AATCTGTCCA 
CTGCCAGCTT 
GCCTGCCTTT 
CTTGTATATG 
GGGGGTGTTA 
CCGAGTTCCA 
GTGTGGTGTT 
ACTGTG 



1100 
1150 
1200 
1250 
1300 
1350 
1400 
1450 
1500 
1550 
1600 
1650 
1700 
1750 
1800 
1850 
1900 
1950 
2000 
2050 
2100 
2150 
2200 
2250 
2300 
2350 
2396 



(2) 



INFORMATION FOR SEQ ID NO: 16: 



(i) 



<xi) 



SEQUENCE CHARACTERISTICS: 



(A) 
(B) 
(C) 
(D) 



LENGTH : 
TYPE: 

STRANDEDNESS : 
TOPOLOGY : 



SEQUENCE DESCRIPTION: 



GAGCAGCCAG GTGAGCCCAA GA 22 



22 

nucleic acid 

single 

linear 

SEQ ID NO; 16: 



(2) INFORMATION FOR SEQ ID NO: 17: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 

(B) TYPE: nucleic acid 
<C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 17: 
TCAGATGCAA GCAGCAACTT TGGC 24 



(2) 



INFORMATION FOR SEQ ID NO: 18: 



<i) 



(xi) 



SEQUENCE CHARACTERISTICS: 



(A) 
<B) 
(C) 
<D) 



LENGTH : 
TYPE: 

STRANDEDNESS : 
TOPOLOGY : 



SEQUENCE DESCRIPTION: 



CACCCTGATG TCATGCTGGA G 21 



21 

nucleic acid 

single 

linear 

SEQ ID NO: 18: 



(2) 



INFORMATION FOR SEQ ID NO: 19: 



(i) 



(xi) 



SEQUENCE CHARACTERISTICS: 



(A) 
(B) 
(C) 
(D) 



LENGTH: 
TYPE: 

STRANDEDNESS : 
TOPOLOGY: 



SEQUENCE DESCRIPTION: 



CATCTAGGAG AGCAATGACG TTC 23 



23 

nucleic acid 

single 

linear 

SEQ ID NO: 19: 



(2) 



INFORMATION FOR SEQ ID NO: 20: 



(i) 



<xi) 



SEQUENCE CHARACTERISTICS: 



(A) 
(B) 
(C) 
(D) 



LENGTH : 
TYPE: 

STRANDEDNESS : 
TOPOLOGY : 



SEQUENCE DESCRIPTION: 



CCATCCTAAT ACGACTCACT ATAGGGC 27 



27 

nucleic acid 

single 

linear 

SEQ ID NO:20: 



(2) INFORMATION FOR SEQ ID NO: 21: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
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<xi) SEQUENCE DESCRIPTION: SEQ ID NO:21: 
ACTCACTATA GGGCTCGAGC GGC 23 

(2) INFORMATION FOR SEQ ID NO: 22: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 15 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
<D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 22: . 
TTTTTTTTTT TTTTT 15 

(2) INFORMATION FOR SEQ ID NO: 23: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 560 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 23 
GGCACGAGGC TAGTGGAGAG ACTGACAAGC AGTCAGCTCA GCGGTCACAA 50 
TACTGTGTGA CAGGAGCTGA GATCCAAGAA GTACTGGGTC CTGTGGGAGC 100 
ACCCCTGACT TGAAGGACAA GTCAGTGCAA CTGAATGGCC AGCCCTTAGT 150 
GATGGTGGAC GACGGGACCC TCCCAGAATT GAAGCCCCGC CCCCTTCGGG 200 
CCGGCCGGAC ATTGGTCATC CCTCCAGTCA CCATGGGCTT TTTTGTGGTC 250 
AAGAATGTCA ATGCTTTGGC CTGCCGCTAC CGATAAGCTA TCCTCACACT 300 
CATGGCTACC AGTGGGCCTG CTGGGCTGCT TCCACTCCTC CACTCCAGTA 350 
GTATCCTCTG TTTTCAGACA TCCTAGCAAC CAGCCCCTGC TGCCCCATCC 4 00 
TGCTGGAATC AACACAGACT TGCTCTCCAA AGAGACTAAA TGTCATAGCG 4 50 
TGATCTTAGC CTAGGTAGGC CACATCCATC C C AAAG G AAA ATGTAGACAT 500 
CACCTGTACC TATATAAGGA T AAAGGC AT G TGTATAGAGC AAAAAAAAAA 550 
AAAAAAAAAA 560 

(2) INFORMATION FOR SEQ ID NO: 24: 

<i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1721 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO:24: 
CTAGAGCTTT CGACTCTCCG CTGCGCGGCA GCTGGCGGGG GGAGCAGCCA GGTGAGCCCA 60 
AGATGCTGCT GCGCTCGAAG CCTGCGCTGC CGCCGCCGCT GATGCTGCTG CTCCTGGGGC 120 
CGCTGGGTCC CCTCTCCCCT GGCGCCCTGC CCCGACCTGC GCAAGCACAG GACGTCGTGG 180 
ACCTGGACTT cTTCACCCAG GAGCCGCTGC ACCTGGTGAG CCCCTCGTTC CTGTCCGTCA 240 
CCATTGACGC CAACCTGGCC ACGGACCCGC GGTTCCTCAT CCTCCTGGGT TCTCCAAAGC 300 
TTCGTACCTT GGCCAGAGGC TTGTCTCCTG CGTACCTGAG GTTTGGTGGC ACCAAGACAG 360 
ACTTCCTAAT TTTCGATCCC AAGAAGGAAT CAACCTTTGA AGAGAGAAGT TACTGGCAAT 4 20 
CTCAAGTCAA CCAGGATATT TGCAAATATG GATCCATCCC TCCTGATGTG GAGGAGAAGT 480 
TACGGTTGGA ATGGCCCTAC CAGGAGCAAT TGCTACTCCG AGAACACTAC CAGAAAAAGT 540 
TCAAGAACAG CACCTACTCA AGAAGCTCTG TAGATGTGCT AT AC AC TTT T GCAAACTGCT 600 
CAGGACTGGA CTTGATCTTT GGCCTAAATG CGTTATTAAG AACAGCAGAT TTGCAGTGGA 660 
ACAGTTCTAA TGCTCAGTTG CTCCTGGACT ACTGCTCTTC CAAGGGGTAT AACATTTCTT 720 
GGGAACTAGG CAATGAACCT AACAGTTTCC TTAAGAAGGC TGATATTTTC ATCAATGGGT 780 
CGCAGTTAGG AGAAGATTAT ATTCAATTGC ATAAACTTCT AAGAAAGTCC ACCTTCAAAA 84 0 
ATGCAAAACT CTATGGTCCT GATGTTGGTC AGCCTCGAAG AAAGACGGCT AAGATGCTGA 900 
AGAGCTTCCT GAAGGCTGGT GGAGAAGTGA TTGATTCAGT TACATGGCAT CACTACTATT 960 
TGAATGGACG GACTGCTACC AGGGAAGATT TTCTAAACCC TGATGTATTG GACATTTTTA 1020 
TTTCATCTGT GCAAAAAGTT TTCCAGGTGG TTGAGAGCAC CAGGCCTGGC AAGAAGGTCT 1080 
GGTTAGGAGA AACAAGCTCT GCATATGGAG GCGGAGCGCC CTTGCTATCC GACACCTTTG 1140 
CAGCTGGCTT TATGTGGCTG GATAAATTGG GCCTGTCAGC CCGAATGGGA ATAGAAGTGG 1200 
TGATGAGGCA AGTATTCTTT GGAGCAGGAA ACTACCATTT AGTGGATGAA AACTTCGATC 12 60 
CTTTACCTGA TTATTGGCTA TCTCTTCTGT TCAAGAAATT GGTGGGCACC AAGGTGTTAA 1320 
TGGCAAGCGT GCAAGGTTCA AAGAGAAGGA AGCTTCGAGT ATACCTTCAT TGCACAAACA 1380 
CTGACAATCC AAG GT AT AAA GAAGGAGATT TAACTCTGTA TGCCATAAAC CTCCATAACG 1440 
TCACCAAGTA CTTGCGGTTA CCCTATCCTT TTTCTAACAA GCAAGTGGAT AAATACCTTC 1500 
TAAGACCTTT GGGACCTCAT GGATTACTTT CCAAATCTGT CCAACTCAAT GGTCTAACTC 1560 
TAAAGATGGT GGATGATCAA ACCTTGCCAC CTTTAATGGA AAAACCTCTC CGGCCAGGAA 1620 
GTTCACTGGG CTTGCCAGCT TTCTCATATA GTTTTTTTGT GATAAGAAAT GCCAAAGTTG 1680 
CTGCTTGCAT CTGAAAATAA AATATACTAG TCCTGACACT G 1721 

(2) INFORMATION FOR SEQ ID NO: 25: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 45 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 24: 
CTTACTTGTC ATCGTCGTCC TTGTAGTCTC GGTAGCGGCA GGCCA 45 



